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Abstract. The safety of infinite state systems can be checked by a backward reachability 
procedure. For certain classes of systems, it is possible to prove the termination of the 
procedure and hence conclude the decidability of the safety problem. Although backward 
reachability is property-directed, it can unnecessarily explore (large) portions of the state 
space of a system which are not required to verify the safety property under consideration. 
To avoid this, invariants can be used to dramatically prune the search space. Indeed, the 
problem is to guess such appropriate invariants. 

In this paper, we present a fully declarative and symbolic approach to the mechaniza- 
tion of backward reachability of infinite state systems manipulating arrays by Satisfiability 
Modulo Theories solving. Theories are used to specify the topology and the data manip- 
ulated by the system. We identify sufficient conditions on the theories to ensure the 
termination of backward reachability and we show the completeness of a method for in- 
variant synthesis (obtained as the dual of backward reachability), again, under suitable 
hypotheses on the theories. We also present a pragmatic approach to interleave invariant 
synthesis and backward reachability so that a fix-point for the set of backward reachable 
states is more easily obtained. Finally, we discuss heuristics that allow us to derive an im- 
plementation of the techniques in the model checker mcmt, showing remarkable speed-ups 
on a significant set of safety problems extracted from a variety of sources. 
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1. Introduction 

Backward reachability analysis has been widely adopted in model checking of safety 
properties for infinite state systems (see, e.g., PQ). This verification procedure repeatedly 
computes pre- images of a set of unsafe states, usually obtained by complementing a safety 
property that a system should satisfy. Potentially infinite sets of states are represented by 
constraints so that pre-image computation can be done symbolically. The procedure halts 
in two cases, either when the current set of (backward) reachable states has a non-empty 
intersection with the set of initial states — called the safety check — and the system is unsafe, 
or when such a set has reached a fix-point (i.e. further application of the transition does not 
enlarge the set of reachable states) — called the fix-point check — and the system is safe. One 
of the most important key insights of backward reachability is the possibility to show the 
decidability of checking safety properties for some classes of infinite state systems, such as 
broadcast protocols [331 EZ] , lossy channel systems [5] , timed networks [6] , and parametric 
and distributed systems with global conditions O 0] . The main ingredient of the technique 
for proving decidability of safety is the existence of a well-quasi-ordering over the infinite 
set of states entailing the termination of backward reachability p]. 

1.1. Array-based systems and symbolic backward reachability. An array-based sys- 
tem (first introduced in [37]) is a generalization of all the classes of infinite state systems 
mentioned above. Even more, it supports also the specification and verification of algo- 
rithms manipulating arrays and fault tolerant systems that are well beyond the paradigms 
underlying the verification method mentioned above. Roughly, an array-based system is a 
transition system which updates one (or more) array variable a. Being parametric in the 
structures associated to the indexes and the elements in a, the notion of array-based sys- 
tem is quite flexible and allows one the declarative specification of several classes of infinite 
state systems. For example, consider parametrised systems and the task of specifying their 
topology: by using no structure at all, indexes are simply identifiers of processes that can 
only be compared for equality; by using a linear order, indexes are identifiers of processes 
so that it is possible to distinguish between those on the left or on the right of a process 
with a particular identifier; by using richer and richer structures (such as trees and graphs), 
it is possible to specify more and more complex topologies. Similar observations hold also 
for elements, where it is well-known how to use algebraic structures to specify data struc- 
tures. Formally, the structure on both indexes and elements is declaratively and uniformly 
specified by theories, i.e. pairs formed by a (first-order) language and a class of (first-order) 
structures. 

On top of the notion of array-based system, it is possible to design a fully symbolic 
and declarative version of backward reachability for the verification of safety properties 
where sets of backward reachable states are represented by certain classes of first-order 
formulae over the signature induced by the theories over the indexes and the elements of 
the array-based system under consideration. To mechanize this approach, the following 
three requirements are mandatory: 

(i) the class T of (possibly quantified) first-order formulae used to represent sets of states 
is expressive enough to represent interesting classes of systems and safety properties, 

(ii) T is closed under pre-image computation, and 

(iii) the checks for safety and fix-point can be reduced to decidable logical problems (e.g., 
satisfiability) of formulae in T . 
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Once requirements (i) — (iii) are satisfied, this technique can be seen as a symbolic version 
of the model checking techniques of [1] revisited in the declarative framework of first-order 
logic augmented with theories (as first discussed in [37J). Using a declarative framework 
has several potential advantages; two of the most important ones are the following. First, 
the computation of the pre-image (requirement (ii) above) becomes computationally cheap: 
we only need to build the formula 4> representing the (iterated) pre-images of the set of 
unsafe states and then put the burden of using suitable data structures to represent (p on 
the available (efficient) solver for logical problems encoding safety and fix-point checks. 
This is in sharp contrast to what is usually done in almost all other approaches to sym- 
bolic model checking of infinite state systems, where the computation of the pre-image is 
computationally very expensive as it requires a substantial process of normalization on the 
data structure representing the (infinite) sets of states so as to simplify safety and fix-point 
checks. The second advantage is the possibility to use state-of-the-art Satisfiability Modulo 
Theories (SMT) solvers, a technology that is showing very good success in scaling up var- 
ious verification techniques, to support both safety and fix-point checks (requirement (iii) 
above). Unfortunately, the kind of satisfiability problems obtained in the context of the 
backward search algorithm requires to cope with (universal) quantifiers and this makes the 
off-the-shelf use of SMT solvers problematic. In fact, even when using classes of formulae 
with decidable satisfiability problem, currently available SMT solvers are not yet mature 
enough to efficiently discharge formulae containing (universal) quantifiers, despite the fact 
that this problem has recently attracted a lot of efforts (see, e.g., (25J EH E3] ) . To alleviate 
this problem, we have designed a general decision procedure for a class of formulae satisfying 
requirement (i) above, based on quantifier instantiation (see |37j and Theorem |3.3| below) ; 
this allows for an easier way to integrate currently available SMT-solvers in the backward 
reachability procedure. Interestingly, it is possible to describe the symbolic backward reach- 
ability procedure by means of a Tableaux-like calculus which offers a good starting point for 
implementation. In fact, the main loop of mcmt |41j the model checker for array-based 
systems that we are currently developing, can be easily understood in terms of the rules of 
the calculus. The current version of the tool uses Yices |31j as the back-end SMT solver. We 
have chosen Yices among the many available state-of-the-art solvers because it has scored 
well in many editions of the SMT-COMP competition and because its lightweight API al- 
lowed us to easily embed it in MCMT. An interesting line of future work would be to make 
the tool parametric with respect to the back-end SMT solver so as to permit the user to 
select the most appropriate for the problem under consideration. 

In our declarative framework, it is also possible to identify sufficient conditions on 
the theories about indexes and elements of the array-based systems so as to ensure the 
termination of the symbolic backward reachability procedure. This allows us to derive all 
the decidability results for the safety problems of the classes of systems mentioned above. 
Interestingly, the well-quasi-ordering used for the proof of termination can be obtained by 
using standard model theoretic notions (namely, sub-structures and embeddings) and in 
conjunction with well-known mathematical results for showing that a binary relation is a 
well-quasi-order (e.g., Dickson's Lemma or Kruskal's Theorem). Contrary to the approach 
proposed in [1] — where some ingenuity is required, in our framework the definition of well- 
quasi-order is derived from the class of structures formalizing indexes and elements in a 
uniform way by using the model-theoretic notions of sub-structure and embedding. 



The latest available release of the tool with all the benchmarks discussed in this paper (and more) can 
be downloaded at http://homes.dsi.unimi.it/~ghilardi/mcmt 
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1.2. Symbolic backward reachability and Invariant synthesis. One of the key ad- 
vantages of backward reachability over other verification methods is to be goal- directed; the 
goal being the set of unsafe states from which pre-images are computed. Despite this, it 
can unnecessarily explore (large) portions of the symbolic state space of a system which 
are not required to verify the safety property under consideration. Even worse, in some 
cases the analysis may not detect a fix-point, thereby causing non-termination. In order 
to avoid visiting irrelevant parts of the symbolic state space during backward reachabil- 
ity, techniques for analyzing pre-images, over-approximating the set of backward reachable 
states, and guessing invariants have been devised (see, e.g., [30 } 11 ^ 139 1 [1^ [TB I [51 ] [3^ 1 [T3 l 136] 
to name a few). The success of these techniques depend crucially on the heuristics used 
to guess the invariants or compute over-approximations. Our approach is similar in spirit 
to [IE] , but employs techniques which are specific for our different intended application 
domains. 

Along this line of research, we discuss a technique for interleaving pre-image compu- 
tation and invariant synthesis which tries to eagerly prune irrelevant parts of the search 
space. Formally, in the context of the declarative framework described above, our main 
result about invariant synthesis ensures that the technique will find an invariant — provided 
one exists — under suitable hypotheses, which are satisfied for important classes of array- 
based systems (e.g., mutual exclusion algorithms or cache coherence protocols). The key 
ingredient in the proof of the result is again the model-theoretic notion of well-quasi-ordering 
obtained by applying standard model theoretic notions that already played a key role in 
showing the termination of the backward reachability procedure. In this case, it allows us 
to finitely characterize the search space of candidate invariants. Although the technique is 
developed for array-based systems, we believe that the underlying idea can be adapted to 
other symbolic approaches to model checking (e.g., [HE]). 

Although the correctness of our invariant synthesis method is theoretically interesting, 
its implementation seems to be impractical because of the huge (finite) search space that 
must be traversed in order to find the desired invariant. In order to make our findings 
more practically relevant, we study how to integrate invariant synthesis with backward 
reachability so as to prune the search space of the latter efficiently. To this end, we de- 
velop techniques that allow us to analyze a set of backward reachable states and then guess 
candidate invariants. Such candidate invariants are then proved to be "real" invariants by 
using a resource bounded variant of the backward reachability procedure and afterwards 
are used during fix-point checking with the hope that they help pruning the search space 
by augmenting the chances to detect a fix-point. Two observations are important. First, 
the bound on the resources of the backward reachability procedure is because we want to 
obtain invariants in a computationally cheap way. Second, we have complete freedom in 
the design of the invariant generation techniques as all the candidate invariants are checked 
to be real invariants before being used by the main backward reachability procedure. As 
a consequence, (even coarse) abstraction techniques can be used to compute candidate in- 
variants without putting at risk the accuracy of the (un-) safety result returned by the main 
verification procedure. For concreteness, we discuss two techniques for invariant guessing: 
both compute over-approximation of the set of backward reachable states. The former, 
called index abstraction (which resembles the technique of [46] ) . projects away the indexes 
in the formula used to describe a set of backward reachable states while the latter, called 
signature abstraction (which can be seen as a form of predicate abstraction |44j). projects 
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away the elements of a sub-set of the array variables by quantifier elimination (if possi- 
ble) . The effectiveness of the proposed invariant synthesis techniques and their integration 
in the backward reachability procedure must be judged experimentally. Hence, we have 
implemented them in MCMT and we have performed an experimental analysis on several 
safety problems translated from available model checkers for parametrised systems (e.g., 
pfs, Undip, the version of UCLID extended with predicate abstraction) or obtained by the 
formalizing programs manipulating arrays (e.g., sorting algorithms). The results confirm 
the viability and the effectiveness of the proposed invariant synthesis techniques either by 
more quickly finding a fix-point (when the backward reachability procedure alone was al- 
ready able to find it) or by allowing to find a fix-point (when the backward reachability 
procedure alone was not terminating). 



How to read the paper. Given the size of the paper, we identify two tracks for the reader. 
The former is the 'symbolic' track and allows one to focus on the declarative framework, 
the mechanization of the backward reachability procedure, its combination with invariant 
synthesis techniques, and its experimental evaluation. The latter is the 'semantic' track 
which goes into the details of the connection between the syntactic characterization of 
sets of states and the well-quasi-ordering permitting one to prove the termination of the 
backward reachability procedure and the completeness of invariant synthesis. To some 
extent, the two tracks can be read independently. 



Symbolic track. In Sections |2.1| and 2.3 some preliminary notions underlying the concept 
of array-based system (Section |3.1[ ) are given. In Section |3j the symbolic version of 
backward reachability is described, requirements for its mechanization are considered, 
namely closure under pre-image computation and decidability of safety and fix-point 
checks (Section 3.2), and its formalization using a Tableaux-like calculus is presented 
(Section 3.3). In Section 5.1, the notion of safety invariants is introduced, their synthesis 
and use to prune the search space of the backward reachability procedure is described, 
and their implementation is considered in Section [6j Particular care has been put in the 
experimental evaluation of the proposed techniques for invariant synthesis as illustrated 
in Section l6~4l 



• Semantic track. In Section 2.2 some notions related to the model theoretic concept of 
embedding are briefly summarized. In Section [4j it is explained how a pre-order can 
be defined on sets of states by using the notion of embedding and how this allows us 
(in case the pre-order is a well-quasi-order) to prove the termination of the backward 
reachability procedure designed in Section [3j For the sake of completeness, it is also 
stated that the safety problem for array-based system is undecidable (Section 4.2) and 
its proof can be found in the Appendix. In Section 5.2, the completeness of an algorithm 
for invariant synthesis (obtained as the dual of backward reachability) is proved under 
suitable hypotheses. 

In Section [7[ we conclude the paper by positioning our work with respect to the state-of- 
the-art in verification of the safety of infinite state systems and we sketch some lines of 
future work. For ease of reference, at the end of the paper, we include the table of contents 
and a figure depicting the two tracks for reading mentioned above. 
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2. Formal Preliminaries 

We assume the usual syntactic (e.g., signature, variable, term, atom, literal, and for- 
mula) and semantic (e.g., structure, truth, satisfiability, and validity) notions of first-order 
logic (see, e.g., [32]). The equality symbol = is included in all signatures considered below. 
A signature is relational if it does not contain function symbols and it is quasi-relational 
if its function symbols are all constants. An expression is a term, an atom, a literal, or 
a formula. Let x be a finite tuple of variables and £ a signature; a £(x)-expression is an 
expression built out of the symbols in £ where at most the variables in x may occur free 
(we will write E(x) to emphasize that E is a E(x)-expression). Let e be a finite sequence 
of expressions and a a substitution; ea is the result of applying the substitution a to each 
element of the sequence e. 

According to the current practice in the SMT literature [52], a theory T is a pair (£,C), 
where £ is a signature and C is a class of £-structures; the structures in C are the models 
of T. Below, we let T = (£,C). A £-formula is T-satisfiable if there exists a S-structure 
M in C such that is true in M under a suitable assignment to the free variables of 4> (in 
symbols, A4 |= (ft); it is T -valid (in symbols, T |= p) if its negation is T-unsatisfiable. Two 
formulae p>\ and <p2 are T -equivalent if p>\ O 992 is T-valid. The quantifier-free satisfiability 
modulo the theory T (SMT(T)) problem amounts to establishing the T-satisfiability of 
quantifier- free S-formulae. 

T admits quantifier elimination iff given an arbitrary formula </?(x), it is always pos- 
sible to compute a quantifier-free formula p' (x) such that T |= \/x((p{x) p>'{x)). Linear 
Arithmetics, Real Arithmetics, acyclic lists, and enumerated data-type theories (see below) 
are examples of theories that admit elimination of quantifiers. 

A theory T = (£,C) is said to be locally finite iff £ is finite and, for every finite set 
of variables x, there are finitely many £(x)-terms ti,...,tk x such that for every further 
£(x)-term u, we have that T \= u = ti (for some i S {1, . . . , k^}). The terms t\, . . . , tk x 
are called S(x) -representative terms; if they are effectively computable from x (and ti is 
computable from u), then T is said to be effectively locally finite (in the following, when we 
say 'locally finite', we in fact always mean 'effectively locally finite'). If £ is relational or 
quasi-relational, then any S-theory T is locally finite. 

An important class of theories, ubiquitously used in verification, formalizes enumerated 
data-types. An enumerated data-type theory T is a theory in a quasi-relational signature 
whose class of models contains only a single finite S-structure Ai = (M,I) such that for 
every m G M there exists a constant c£S such that c 1 = m. For example, enumerated 
data-type theories can be used to model control locations of processes in parametrised 



systems (see Example 3.1 below). 



2.1. Case Defined Functions. In the SMT-LIB format [53J, it is possible to use if-then- 
else constructors when building terms. This may seem to be beyond the realm of first order 
logic, but in fact these constructors can be easily eliminated in SMT problems. Since case- 
defined functions (introduced via nested if-then-else constructors) are quite useful for us 
too, we briefly explain the underlying formal aspects here. Given a theory T, a T-partition 
is a finite set Ci(x), . . . , C n (x) of quantifier- free formulae such that T |= Vx VT=i Ci(x) 
and T \= /\^ / • Vx-i(Cj(x) A Cj(x)). A case-definable extension T' = (£',£') of a theory 
T = (S,C) is obtained from T by applying (finitely many times) the following procedure: 
(i) take a T-partition C\(x), . . . , C n (x) together with E-terms ti(x), . . . , t n (x); (ii) let £' be 
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£ U {F}, where F is a "fresh" function symbol (i.e. F G" E) whose arity is equal to the 
length of x; (iii) take as C the class of X'-structures Ai whose S-reduct is a model of T and 
such that Ai |= f\£_^ix (Ci(x) — > F(x) = U(x)). Thus a case-definable extension T' of a 
theory T contains finitely many additional function symbols, called case-defined functions. 

Lemma 2.1. Let T' be a case- definable extension ofT; for every formula eft' in the signature 
of T' it is possible to compute a formula (ft in the signature of T such that (ft and (ft' are 
T' -equivalent. 

Proof. It is sufficient to show the claim for an atomic (ft' containing a single occurrence of a 
case defined function: if this holds, one can get the general statement by the replacement 
theorem for equivalent formulae (the procedure must be iterated until all case defined ad- 
ditional function symbols are eliminated). Let eft be atomic and let it contain a sub-term of 
the kind Fa in position p. Then (ft is T'-equivalent to \J \{CiO A (ft'[tia] p ). Here the Q's are 
the partition formulae for the case definition of F and the tj's are the related 'value' terms; 
the notation (j)'[tia] p means the formula obtained from (ft' by putting tiO in position p. O 

Notice that a case-definable extension T" of T is a conservative extension of T, i.e. 
formulae in the signature of T are T-satisfiable iff they are T'-satisfiable (this is because, 
as far as the signature of T is concerned, the two theories have 'the same models'). Thus, 



by Lemma 2.1, T and T' are basically the same theory and, by abuse of notation, we shall 



write T instead of T' . 



2.2. Embeddings. We summarize some basic model-theoretic notions that will be used 
in Sections [4] and [5] below (for more details, the interested reader is pointed to standard 
textbooks in model theory, such as |22j). 

A Yu-embedding (or, simply, an embedding) between two S-structures Ai = (M,I) 
and Af = (N, J) is any mapping \i : M — > N among the corresponding support sets 
satisfying the following three conditions: (a) fx is an injective function; (b) fi is an algebraic 
homomorphism, that is for every n-ary function symbol / and for every a\, . . . , a n G M, we 
have /^(//(ai), . . . , n(a n )) = /i(f M (ai, . . . , a n )); (c) \i preserves and reflects predicates, i.e. 
for every n-ary predicate symbol P, we have (ai, . . . , a n ) G P M iff (/x(ai), . . . , |u(a„)) G P^ . 

If M C N and the embedding u : Ai — > Af is just the identity inclusion M C N, 
we say that Ai is a substructure of Af or that Af is an superstructure of Ai . Notice that a 
substructure of Af is nothing but a subset of the support set of Af which is closed under 
the S-operations and whose E-structure is inherited from Af by restriction. In fact, given 
Af = (N, ff) and GCJV, there exists the smallest substructure of Af containing G in its 
support set. This is called the substructure generated by G and its support set can be 
characterized as the set of the elements b £ N such that t^(a) = b for some S-term t and 
some finite tuple a from G (when we write t^(a) = b, we mean that (Af, a) |= t(x) = y for 
an assignment a mapping the a to the x and b to y). 

Below, we will make frequent use of the easy — but fundamental — fact that the truth of 
a universal (resp. existential) sentence is preserved through substructures (resp. through 
superstructures). A universal (resp. existential) sentence is obtained by prefixing a string 
of universal (resp. existential) quantifiers to a quantifier- free formula. 
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2.3. A many-sorted framework. From now on, we use many-sorted first-order logic. All 
notions introduced above can be easily adapted to a many-sorted framework. In the rest 
of the paper, we fix (i) a theory Tj = (Ej,Cj) whose only sort symbol is INDEX; (ii) a 
theory Te = (E#,Ce) for data whose only sort symbol is ELEM (the class Ce of models of 
this theory is usually a sing leton). The theory Af = (E,C) of arrays with indexes in 
Tj and elements in Te is obtained as the combination of Tj and Te as follows: INDEX, 
ELEM, and ARRAY are the only sort symbols of Af, the signature is X := Ej U He U {-[-]} 
where _[_] has type ARRAY, INDEX — > ELEM (intuitively, a\i\ denotes the element stored in 
the array a at index i); a three-sorted structure M = (INDEX^, ELEM^, ARRAY^, 1) is in 
C iff ARRAY-* 4 is the set of (total) functions from INDEX-* 4 to ELEM^, the function symbol 
_[„] is interpreted as function application, and Mi = (INDEX^,!^), Me = (ELEM M ,X| Eg ) 
are models of Tj and Te, respectively (here T\s x is the restriction of the interpretation I 
to the symbols in Ex for X £ {I, E}). 

Notational conventions. For the sake of brevity, we introduce the following notational 
conventions: d, e range over variables of sort ELEM, a over variables of sort ARRAY, i,j,k, 
and z over variables of sort INDEX. An underlined variable name abbreviates a tuple of 
variables of unspecified (but finite) length and, if i := i\, . . . , i n , the notation a[i] abbrevi- 
ates the tuple of terms a[ii], . . . , a[i n ]. Possibly sub/super-scripted expressions of the form 
0(1) c), e) denote quantifier-free (E/U Ee) -formulae in which at most the variables 
iUe occur. Also, <j>(i,t/e) (or simply 4>(i,t)) abbreviates the substitution of the E-terms t 
for the variables e. Thus, for instance, <j>(i, a\z\) denotes the formula obtained by replacing 
e with a[i] in a quantifier-free formula <j>{i,e). 



3. Backward Reachability 

Following [32] i we focus on a particular yet large class of array-based systems corre- 
sponding to guarded assignments. 



3.1. Array-based Systems. A (guarded assignment) array-based (transition) system (for 
(Tj, Te) ) is a triple S = (a, /, r) where (i) a is the state variable of sort ARRAyJ^] (ii) 1(a) is 
the initial S(a)-formula; and (iii) r(a,a') is the transition (E U Ex))(a, a')-formula, where 
a' is a renamed copy of a and E^j is a finite set of case-defined function symbols not in 
Ej U Eg. Below, we also assume 1(a) to be a V 7 -formula, i.e. a formula of the form 
Vi.(f>(i, a[i\), and r(a,a') to be in functional form, i.e. a disjunction of formulae of the 
form 

3i(Mi,a[iD AVja'b'] = F G (i,a[i},j,a\j})) (3.1) 
where 0£, is the guard (also called the local component in [37]), and Fq is a case-defined 
function (called the global component in [37]). To understand why we say that formu- 



lae (3.1) are 'in functional form', consider A-abstraction; then, the sub- formula Vja'fj] = 
Fg(i, a[i], j, a[j})) can be re-written as a' = Xj.Fa(i, a[i],j, a[j]). In [37], we adopted a more 
liberal format for transitions; the format of this paper, however, is sufficient to formalize all 
relevant examples we met so far. Results in this paper extend in a straightforward way to 



2 For the sake of simplicity, we limit ourselves to array-based systems having just one variable a of sort 
ARRAY. All the definitions and results can be easily generalized to the case of several variables of sort ARRAY. 
In the examples, we will consider cases where more than one variable is required and, in addition, the theory 
Te is many-sorted. 
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the case in which Te is assumed to have quantifier elimination and (3.1) is allowed to have 
existentially quantified variables ranging over data. This extension is crucial to formalize, 
e.g., non-deterministic updates or timed networks |20j . 

Given an array-based system S = (a, I, r) and a formula U(a), (an instance of) the 
safety problem is to establish whether there exists a natural number n such that the formula 

I (ao) A r(a , at) A • • ■ A r(a„_i, a n ) A U (a n ) (3.2) 

is ^I'-satisfiable. If there is no such n, then S is safe (w.r.t. U); otherwise, it is unsafe since 
the ^^-satisfiability of (3.2) implies the existence of a run (of length n) leading the system 
from a state in / to a state in U. From now on, we assume U(a) to be a 3 1 -formula, 
i.e. a formula of the form 3i.<j)(i, a[i\). 

We illustrate the above notions by considering the Mesi cache coherence protocol, taken 
from the extended version of [2]. 

Example 3.1. Let Tj be the pure theory of equality and Te be the enumerated data-types 
theory with four constants denoted by the numerals from 1 to 4. Each numeral corresponds 
to a control location of a cache: 1 to modified, 2 to exclusive, 3 to shared, and 4 to 
invalid. 

Initially, all caches are invalid and the formula characterizing the set of initial states 
is Vi. a[i] = 4. There are four transitions. In the first (resp. second) transition, a cache 
in state invalid (resp. shared) goes to the state exclusive and invalidates all the other 
caches. Formally, these can be encoded with formulae as follows: 

3i. (a[i] = 4 A a = Xj. (if (j = i) then 2 else 4)) and 

3i. (a[i\ = 3 A a' = Xj. (if (j = i) then 2 else 4)) 

In the third transition, a cache in state invalid goes to the state shared and so do all 
other caches: 

3i. (a[i] = 4Aa' = Xj. 3). 

In the fourth and last transition, a cache in state exclusive can move to the state modified 
(the other caches maintain their current state): 

3i. {a[i} = 2 A a' = Xj. (if (j = i) then 1 else a[j])). 

To be safe, the protocol should not reach a state in which there is a cache in state modified 
and another cache in state modified or in state shared. Thus, one can take 

3ii 3i 2 . {h / 12 A a[it] = 1 A (a[i 2 \ = 1 V a[i 2 ] = 3)) 

as the unsafety formula. □ 

The reader with some experience in infinite state model checking may wonder how it is 
possible to encode in our framework transitions with 'global conditions,' i.e. guards requiring 
a universal quantification over indexes. Indeed, the format (3.1) for transitions is clearly 
too restrictive for this purpose. However, it is possible to overcome this limitation by using 
the stopping failures model introduced in the literature about distributed algorithms (see, 
e.g., [47J): according to this model, processes may crash at any time and do not play any 
role in the rest of the execution of the protocol (they "disappear"). In this model, there 
is no need to check the universal conditions of a transition, rather the transition is taken 
and any process not satisfying the global condition is assumed to crash. In this way, we 
obtain an over- approximation of the original system admitting more runs and any safety 
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function BReach(C7 : zK-formula) 
1 Pi — U; B i — _L; 

while (P A ->P is Af -sat.) do 
if (JAP is Af-sat.) 

then return unsafe; 
B i — PVB; 
P i — Pre(T,P); 
end 

return (safe, B); 

(a) 



function Slnv([/ : zK-formula) 
1 P i — ChooseCover(C7); B < — _L; 
while (P A ->P is Af-sat.) do 
if {I A P is Af-sat.) 

then return failure; 
B i — P VP; 

P i — ChooseCover(Pre(r, P)); 
end 

return (success, -<B); 

(b) 



Figure 1: The basic backward reachability (a) and the invariant synthesis (b) algorithms 



certification obtained for this over-approximation is also a safety certification for the original 
model. Indeed, the converse is not always true and spurious error traces may be obtained. 
Interestingly, the approximated model can be obtained from the original system by simple 
syntactical transformations of the formulae encoding the transitions requiring the universal 
conditions. For more details concerning the implementation of the approximated model in 
mcmt, the reader is referred to |38j . A more exhaustive discussion of the use of a similar 
approximated model can be found in [2J, [3], 08] . 



3.2. Backward Reachable States. A general approach to solve instances of the safety 
problem is based on computing the set of backward reachable states. For n > 0, the n- 
pre-image of a formula K(a) is Pre°(r, K) := K and Pre n+l {r, K) := Pre(r, Pre n (r, K)), 
where 

Pre{r,K) := 3a'.(r(a, a) A K(a')). (3.3) 

Given S = (a, /, t) and U (a), the formula Pre n (r, U) describes the set of backward reach- 
able states in n steps (for n > 0). At the (end of) n-th iteration of the loop, the ba- 
sic backward reachability algorithm, depicted in Figure [T] (a), stores in the variable B the 
formula BR u (t,U) := \/^ J Pre 1 ( / r, representing the set of states which are backward 
reachable from the states in U in at most n steps (whereas the variable P stores the formula 
Pre n+1 (r, U)). While computing BR n (r, U), BReach also checks whether the system is un- 
safe (cf. line 3, which can be read as '/APre n (r, U) is Af -satisfiable') or a fix-point has been 
reached (cf. line 2, which can be read as '-i(PP n+1 (r, U) BR n {T,U)) is Af -satisfiable' 
or, equivalents, that '(PP n+1 (r, U) -)• BR n (r, U)) is not Af -valid'). When BReach returns 
the safety of the system (cf. line 7), the variable B stores the formula describing the set of 
states which are backward reachable from U which is also a fix-point. 

Indeed, for BReach (Figure [T] (a)) to be a true (possibly non-terminating) procedure, 
it is mandatory that (i) zK-formulae are closed under pre-image computation and (ii) both 
the Af -satisfiability test for safety (line 3) and that for fix-point (line 2) are effective. 

Concerning (i), it is sufficient to use the following result from [42jn 



The proposition may be read as the characterization of a weakest liberal pre-condition transformer [29] 
for array-based systems. 
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Proposition 3.2. Let K(a) := 3k4>(k,a[k}) and r(o,o') := V/T=i 3 i {<^ h L {h a W) A a' = 
\j.F^(i, a[£], j, a[j])). Then, Pre(r,K) is Af -equivalent to an (effectively computable) 3 1 - 
formula. 

Proof. Let be one of the m disjuncts of r. Using the A-abstraction formulation and a 
single /3-reduction step, it is clear that Pre(r/j, K) is ^4^-equivalent to the following 3 1 - 
formula 

3i3k.(^(i,a[i)A<l>(k,Fh(i,a[{\,k,a[k]))) (3.4) 

where A; is the tuple fei,...,fcj and Fq(i, a[{\, k, a\k\)) is the formula obtained from 
</>(/c, a'[fc]) by replacing a'[fc s ] with F^^afi], a[k s ]), for s = 1, I. Now it is sufficient to 
eliminate the Fq as shown in Lemma 2.1 As a final step, the existential quantifiers can be 



moved in front of the disjunction arising from the m disjuncts n,...,r m . □ 



The proof and the algorithm underlying Proposition 3.2 are quite simple. This is in 
sharp contrast to most approaches to infinite state model checking available in the literature 
(e.g., [21 E]) that use special data structures (such as strings with constraints) to represent 
sets of states. These special data structures can be considered as normal forms when 
compared to our formulae. In this respect, our framework is more flexible since — although 
it can use normal forms (when these can be cheaply computed) — it is not obliged to do so. 
The drawback is that safety and fix-point checks may become computationally much more 
expensive. In particular, the bottle-neck is the handling of the quantified variables in the 
prefix of zK-formulae which may become quite large at each pre-image computation: notice 



that the prefix 3 k is augmented with 3 i in (3.4) with respect to K. This and other issues 



which are relevant for the implementation of our framework are discussed in [42, 40, 41J. 

Concerning the mechanization of the safety and fix-point checks (point (ii) above), 
observe that the formulae involved in the satisfiability checks are I A BR n {r,K) and 
-<(BR n (T,U) — > BR n ~ l (T,U)). Since we have closure under pre-image computation, both 
formulae are of the form 3a 3iMj a[{\,a\j]), where ijj is quantifier free: we call these 

sentences 3 A,I \/ 1 -sentences [37] , 

Theorem 3.3. The Af -satisfiability of 3 A,I \/ 1 -sentences is decidable if (I) Tj is locally 
finite and is closed under substructure^ and (II) the SMT(Tj) and SMT(Te) problems are 
decidable. Under the same hypotheses, it holds that an 3 A)I \/ 1 -sentence is Af -satisfiable iff 
it is satisfiable in a finite index model ( a finite index model is a model M in which the set 
INDEX M has finite cardinality). 



A generalization of Theorem 3.3 can be found in the extended version of |37j and is 



reported in Appendix A (with a proof) to make this paper self-contained. The proof of 



Theorem 3.3 is the starting point to develop a satisfiability procedure for formulae of the 
form 3a 3i Vj tp(i,j,a[i],a[j]) consisting of the following steps: (a) the variables a,i are 
Skolemized away: (b) the variables j are instantiated in all possible ways by using the 
representative z-terms; (c) the resulting combined problem is purified and an arrangement 
(i.e. an equivalence class) over the shared index variables is guessed; (d) the positive literals 
from this arrangement are propagated to the T^-literals (this is a variant of the Nelson- 
Oppen schema adopted in 'theory connections,' see [H]); (e) finally, the purified constraints 



4 By this we mean that if M is a model of TV and M is a substructure of M , then Af is a model of Tj as 
well. 
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are passed to the theory solvers for Tj and Te, respectively. From the implementation 
viewpoint, powerful heuristics are needed |40j to keep the potential combinatorial explosion 
in step (b) under control. Fortunately, the adoption of a certain format for formulae (called, 
'primitive differentiated,' see below for details) makes steps (c) and (d) redundant (see [30j 
for more on this point). 



Hypothesis (I) from Theorem 3.3 concerns the topology of the system (not the data 
manipulated by the components of the system) and its intuitive meaning can be easily 
explained when the signature is relational: in that case, local finiteness is guaranteed 
and closure under substructures says that if some elements are deleted from a model of Tj, 
we still get a model of Tj (i.e. the topology does not change under elimination of elements). 
For example, Hypothesis (I) is true for (finite) sets, linear orders, graphs, forests, while it 
does not hold for 'rings,' because, after deleting one of their elements, they are no more rings. 
We emphasize that it is not possible to weaken Hypothesis (I) on the theory Tj. Indeed, it 
is possible to show that any weakening yields undecidable fragments of the theory of arrays 
over integers [17J (as it is shown in Appendix A). Furthermore, we observe that Hypothesis 
(I) is not too restrictive because, as said above, it concerns only the topology of the system. 



So, for example, the topology of virtually any cache coherence protocol (see Example 3.1) 
can be formalized by finite sets while that of standard mutual exclusion protocols by linear 
orders. 

We summarize our working hypotheses in the following. 

Assumption 3.4. We fix an array-based system S = (a, I, r) such that the initial formula I 
is a V 7 -formula, and the transition formula r(a, a') is V/hLi T h( a , a '), where is a formula 



of the form (3.1) for h = l,...,m. We suppose that 3-formulae are used to describe the 



set of unsafe states. Finally, we assume that hypotheses (I) and (II) of Theorem 3.3 are 
satisfied. 



3.3. Tableaux-like Implementation of Backward Reachability. A naive implemen- 
tation of the algorithm in Figure [T] (a) does not scale up. The main problem is the size of 
the formula BR n (T,U) which contains many redundant or unsatisfiable sub-formulae. We 
now discuss how Tableaux-like techniques can be used to circumvent these difficulties. We 
need one more definition: an zK-formula 3i\ ■ ■ ■ 3i n <j) is said to be primitive iff is a con- 
junction of literals and is said to be differentiated iff <f> contains as a conjunct the negative 
literal ik ^ i\ for all 1 < k < I < n. By applying various distributive laws together with the 
rewriting rules 

3j(i = jA9)~>9(i/j) and 9 ~> (9 A i = j) V (6> A i ^ j) (3.5) 

it is always possible to transform every zK-formula into a disjunction of primitive differen- 
tiated ones. 

We initialize our tableau with the zK-formula U(a) representing the set of unsafe states. 
The key observation is to revisit the computation of the pre-image as the following inference 
rule (we use square brackets to indicate the applicability condition of the rule): 

K [K is primitive differentiated] 

P re I m or 

Pre(n,K) | ••• | Pre(r m ,K) g 

where Pre(r/ l , K) computes the zK-formula which is ^4^-equivalent to the pre-image of K 



w.r.t. Th (this is possible according to the proof of Proposition 3.2). 
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Since the 3^-formulae labeling the consequents of the rule Prelmg may not be primitive 
and differentiated, we need the following Beta rule 

K 



K\ I • • • I K n 



Beta 



where K is transformed by applying rewriting rules like (3.5) together with standard dis- 
tributive laws, in order to get Kx, . . . ,K n which are primitive, differentiated and whose 
disjunction is ^4^-equivalent to K. 

By repeatedly applying the above rules, it is possible to build a tree whose nodes 
are labelled by EK-formulae describing the set of backward reachable states. Indeed, it 
is not difficult to see that the disjunction of the EK-formulae labelling all the nodes in 
the (potentially infinite) tree is A^-equivalent to the (infinite) disjunction of the formulae 
BR n (T,U), where r := V/hLi r /i- Indeed, there is no need to fully expand our tree. For 
example, it is useless to apply the rule Prelmg to a node v labelled by an zK-formula which 
is ^4^-unsatisfiable as all the formulae labelling nodes in the sub-tree rooted at v will also be 
j^-unsatisfiable. This observation can be formalized by the following rule closing a branch 
in the tree (we mark the terminal node of a closed branch by x ) : 

K \K is Af'-unsatisfiablel 

— 1 *— - NotAppI 

This rule is effective since zK-formulae are a subset of 3^'^V^-sentences and the ^^-satisfia- 



bility of these formulae is decidable by Theorem 3.3 

According to procedure B Reach, there are two more situations in which we can stop 
expanding a branch in the tree. One terminates the branch because of the safety test (cf. 
line 3 of Figure [I] (a)): 

K [I A K is Af -satisfiable] 

c Safety 

UnSafe 

Interestingly, if we label with the edge connecting a node labeled with K with that labeled 
with Pre(r/ l , K) when applying rule Prelmg, then the transitions T/ ll ,...,r^ e labelling the 
edges in the branch terminated by UnSafe (from the leaf node to the root node) give a error 
trace, i.e. a sequence of transitions leading the array-based system from a state satisfying / 
to one satisfying U . Again, rule UnSafe is effective since I A K is equivalent to an 3 A ' I \/ 1 



sentence and its ^^-satisfiability is decidable by Theorem 3.3 The other situation in which 
one can close a branch corresponds to the fix-point test (cf. line 2 of Figure [T] (a)) 

K [K A /\{^K'\K' ± K} is ^f-unsatisfiable] 
FixPoint 

X 

where K ! < K means that K 1 is a primitive differentiated zK-formula labeling a node preced- 
ing the node labeling K (nodes can be ordered according to the strategy for expanding the 
tree). Once more, this rule is effective since K A /\{^K'\K' X K} can be straightfor wardl y 
transformed into an B^V^-sentence whose ^4^-satisfiability is decidable by Theorem 



3.3 



As mentioned above, from the implementation point of view, clever heuristics are needed 



to reduce the instances that have to be generated for the satisfiability test of Theorem 3.3 
and to trivialize the recognition of the unsatisfiable premise of the rule NotAppI. In addition, 
the satisfiability checks required by Rule FixPoint should be performed incrementally by 
considering formulae in reverse chronological order (i.e. the pre-images generated later are 
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added first and those generated early are possibly added later). The interested reader is 
pointed to [ID] for a more exhaustive discussion about these issues. 

A final remark is in order. One may think that the main difference between our frame- 
work to model checking infinite state systems and other approaches lies just in the technol- 
ogy used for constraint solving; our system, MCMT, uses an SMT solver while other tools 
(such as PFS [2]) use efficient dedicated algorithms. This is only part of the story. In fact, 
mcmt usually produces many fewer nodes while visiting the tree whose nodes are labelled 
with the formulae representing sets of backward reachable states, compared to other sys- 
tems. This is so because our approach is fully declarative and mcmt symbolically represents 
also the topology of the system, not only the data. The other model checkers use constraints 
only to represent the data manipulated by the system while the topology is encoded by us- 
ing an ad hoc data structure, which usually requires more effort to represent sets of states. 
To illustrate this fundamental aspect, we consider a simple (but tricky) example. 

Example 3.5. Let Tj be the theory of linear orders and Te be an enumerated data- 
type with 15 constants denoted by the numerals from 1 to 15. Consider the following 
parametrized system having 7 transitions and 15 control locations: 

• the first transition allows process i to move from location 1 to location 2 provided there 
is a process j to the right of i (i.e. % < j holds) which is on location 9; 

• similarly, the second transition allows process i to move from location 2 to location 3 
provided there is a process j to the right of i which is on location 10, and so on (the 
last transition allows process i to move from location 7 to location 8 provided there is a 
process j to the right of i which is on location 15). 

Initially, all processes are in location 1. We consider the following safety problem: is it 
possible for a process to reach location 8? The answer is obviously no. 

mcmt solves the problem by generating 7 nodes in about 0.02 seconds on a standard 
laptop. On the contrary, PFS takes about 4 minutes on the same computer and generates 
thousands of constraints. Why is this so? The point is that tools like PFS do not symbolically 
represent the system topology and need to specify the relative positions of all the involved 
processes. In contrast, mcmt can handle partial information like "there exist 7 processes 
to the right of i whose locations are from 9 to 15, respectively" just because it is based on 
a deductive engine, i.e. the SMT solver. 

Thus, MCMT represents a fully declarative approach to infinite state model checking 
that, when coupled with appropriate heuristics, should pave the way to the verification of 
systems with more and more complex topologies that other tools cannot handle. D 



4. Termination: a semantic analysis 

Termination of our tableaux calculus (and of the algorithm of Figure [I] (a)) is not 
guaranteed in general as safety problems are undecidable even when the data structures 
manipulated by the system are simple (Sec. 4.2 ). However, it is possible to identify sufficient 



conditions to obtain termination (Sec. |4.3) which are useful in some applications. We begin 
by introducing an important definition to be used in this and the following section. 
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4.1. Configurations. A state of our array-based system S = (a,J,r) is a pair (s,A4), 
where Ai is a model of and s G ARRAY"^. By recalling the last part of the statement 
of Theorem 3.3 we can focus on a sub-class of the states (often called configurations) 
restricting A4 to be a finite index model. Formally, an Af -configuration (or, simply, a 
configuration) is a pair (s, Ai) such that s is an array of a finite index model Ai of (Ai 
is omitted whenever it is clear from the context). We associate a S/-structure si and a 
Sg-structure S£ with an ^4^-configuration (s,Ai) as follows: the S/-structure si is simply 
the finite structure Aii, whereas se is the smallest S^-substructure of Me containing 
the image of s (in other words, if INDEX-^ = {ci, . . . , c^}, then se is the smallest S^- 
substructure containing {s(c\), . . . , s(cfc)}). 



4.2. Undecidability of the safety problem. In the general case, safety problems are 
undecidable. The result is not surprising and we report it in the following for the sake of 
completeness. 

Theorem 4.1. The problem: "given an 3 1 -formula U, deciding whether the array-based 
system S is safe w.r.t. U" is undecidable (even if Te is locally finite). 

The proof consists in a rather straightforward reduction from the reachability problem 
of Minsky machines. See Appendix A for details. 



4.3. Decidability of the safety problem: sufficient conditions. A specific feature of 
array-based systems is that a partial ordering among configurations can be defined. This is 
the key ingredient in establishing the termination of the backward reachability procedure 
(and thus the decidability of the related safety problem) and characterizing the completeness 
of invariant synthesis strategies (as it will be shown in Section [5] below) . 

A pre-order (P, <) is a set endowed with a reflexive and transitive relation; an upset, 
also called an upward closed set, of such a pre-order is a subset U C P such that (p G U 
and p < q imply q G U). An upset U is finitely generated iff it is a finite union of cones, 
where a cone is an upset of the form \p = {q G P \ p < q} for some p G P. Two elements 
p,q G P are incomparable (equivalent) if neither (both) p < q nor (and) q < p. 

We are ready to define a pre-order over configurations. Let s,s' be configurations: we 
say that s' < s holds iff there are a S/-embedding \x : s'j — > sj and a S^-embedding 
v : s' E — > se such that the set-theoretical compositions of /i with s and of s' with v are 
equal. This is depicted in the following diagram: 

s'j- si 



t t 

S E " V " S E 

In case \x and v are both inclusions, we say that s' is a sub-configuration of s. 

Finitely generated upsets of configurations and zK-formulae can be used interchangeably 
under suitable assumptions. Let K (a) be an 3 J -formula; we let \K\ := {(s,M.) \ M. \= 
K(s)}. 

Proposition 4.2. For every 3 1 -formula K(a), the set \K\ is upward closed. For every 
^-formulae K X ,K 2 , we have that \Ki} C {K 2 j iff Af \= K x -> K 2 . 
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Proof. Let us first show that the set [if] is upward closed. By using disjunctive normal 
forms and distributing existential quantifiers over disjunctions, we can suppose — without 
loss of generality-that K (a) is of the form 3i(ft(i, a[i]), where eft is a conjunction of E/ U £g- 
literals (the general case follows from this one because a union of upsets is an upset). If we 
also separate S/- and S^-literals, we can suppose that <ft(i, a[i]) is of the kind (fti(i) /\<ftE(a[i]), 
where (fti is a conjunction of £r-literals and 4>E is a conjunction of S^-literals. Suppose 
now that (s,M) and (t,J\f) are configurations such that s < t and Ai \= K{s): we wish 
to prove that N |= K(t). From Ai |= K(s), it follows that there are elements i from 
INDEX^ such that At \= <fti(i) A (/>e(s[i]), i.e. such that si |= <fti(i) and se \= 4>e{s{%)) (to 
infer the latter, recall that the operations a[i] are interpreted as functional applications in 
our models and also that truth of quantifier free formulae is preserved when considering 
substructures). Now s < t says that there are embeddings fi : S[ — > tj and v : se — > ^e 
such that uo s = t o a. Since truth of quantifier free formulae is preserved when considering 
superstructures, we get tj |= (ftj(fi(i)) and tE \= 0e(^(s(£)) (that is, tE \= 4>E(t(iJ,(i)))) and 
also M |= <fti{fi{i)) A 0g(£[/i(£)]), which implies Af \= K(t), as desired. 

Let us now prove the second claim of the Proposition. That Af |= K\ — > K2 implies 
{Ki} C [if 2 ] is trivial. Suppose conversely that Af \/= K\ — > K2, which means that 
K\(a) A -1^2(0) is ^f'-satisfiable: since this implies that K\(a) A -1^2(0) is satisfiable in a 



finite index model of Af (see Theorem 3.3), we immediately get that [i^i] % \Ki\. □ 



Before continuing, we recall the standard model-theoretic notion of Robinson diagrams 
and some related results (see, e.g., [22] for more details). Let M. = (M, X) be a E-structure 
which is generated by G C M. Let us take a free variable x g for every g £ G and call G x 
the set {x g \ g G G}J^] The T^Q-diagram 6~m{G) of M. is the set of all S(G x )-literals L such 
A4, a |= L, where a is the assignment mapping x g to g. 

The following celebrated result [22] is simple, but nevertheless very powerful and it will 
be used in the rest of the paper. 

Lemma 4.3 (Robinson Diagram Lemma). Let A4 = (M,X) be a Ti-structure which is 
generated by G C M and M = (N,J) be another Y^-structure. Then, there is a bijective 
correspondence given by 

fx(g) = a(x 9 ) (4.1) 
(for all g € G) between assignments a on N such that M, a \= 5m(G) and T,-embeddings 
(j, : M — > M. 



In other words, (4.1) can be used to define \x from a and conversely. Notice that an 
embedding fj, : M. — > M is uniquely determined, in case it exists, by the image of the set 
of generators G: this is because the fact that G generates Ai implies (and is equivalent to) 
the fact that every c E M is of the kind t x (g), for some term t and some g from G. 

The diagram 5_m{G) usually contains infinitely many literals, however there are impor- 
tant cases where we can keep it finite. 

Lemma 4.4. Suppose that Ai is a Ti-structure (where S is a finite signature), whose support 
M is finite; then for every set G C M of generators, there are finitely many literals from 
8m (G) having all remaining literals of 5_m (G) as a logical consequence. 

^One may wonder if assuming "countably many variables" is too restrictive since G may be uncountable. 
There are two ways to avoid this problem. First, we can use free constants instead of variables (this is the 
standard solution). Second, we realize that we do not need to consider — in this paper — the case when G is 
uncountable since in all our applications, G is finite. 
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Proof. Choose S(G x )-terms t±, . . . ,t n such that (under the assignment a : x g i— > g), M is 
equal to the set of the elements assigned by a to t%, . . . ,t n (this is possible because the 
elements of G are generators and M is finite); we also include the x g varying g G G among 
the ti, . . . ,t n . We can get the desired finite set S of literals by taking the set of atoms of 
the form 

R{ti x , . . . , ti k ), fytii > • ■ • j ti k ) = U k+1 
(as well as their negations), which are true in A4 under the assignment a. In fact, modulo 
S, it is easy to see by induction on the structure of the term u that every S(G x )-term u is 
equal to some tf, it follows that every literal from Sm(G) is a logical consequence of S. □ 

Whenever the conditions of the above Lemma are true, we can take a finite conjunction 
and treat 5m (G) as a single formula: notice that we are allowed to do so whenever G is 
finite and A4 is a model of a locally finite theory. 

Proposition 4.5. Let Te be locally finite. It is possible to effectively associate 

(i) an 3 1 -formula K s with every Af -configuration (s,M) such that {K S J =i [s; 

(ii) a finite set {s±, . . . , s n } of Af -configurations with every 3 1 -formula K such that K is 
Af -equivalent to K Sl V • • • V K Sn . 

As a consequence of ([!]) and finitely generated upsets of Af -configurations coincide 
with sets of Af -configurations of the kind \K\, for some 3 1 -formula K. 

Proof. Ad we take G, G' to be the support of sj and the image of the support of 
si under the function s, respectively; clearly G is a set of generators for si and G' is a 
set of generators for se- Let us call the set of variables G X ,G' X as i := {ii, . . . ,i n } and 
e := {ei, . . . , e n }, respectively. We take K s to be 

3i(6 Sl (i)A8 SE (a Q [i\)) (4.2) 

where ao is a fresh array variable (in other words, we take the diagrams 5 SI (G),S SE (G'), 
make in the latter the replacement e i— > ao[i], take conjunction, and quantify existentially 
over the i). For every configuration (t,Af), we have that t G {K s } iff <5s 7 (£) A 5 SE (ao[{\) is 
true in Af under some assignment a mapping the array variable ao to t, that is iff there are 



embeddings /x : sj — > ti and v : se — > as prescribed by Lemma 4.3 (i.e. Robinson 
Diagram Lemma). These embeddings map the generators G onto the indexes assigned to 
the i by a and the generators G' to the elements assigned by a to the terms ao[i], which 
means precisely that t o fj, = v o s. Thus t G [if s ] is equivalent to s < t, as desired. 

Ad ([h]): modulo taking disjunctive normal forms, we can suppose that K(ao) is equal 
to 3i \/ k ((pk(i) A "0fc(ao[i]))> where the ^'s are E/-formulae, the ^'s are S^-formulae, and 
i := ii, . . . , i m . Since T/ is locally finite, we can assume that for every representative z-term 
t there is an i s G i such that t = i s is an j^-logical consequence of (j>k, for all k: this 
is achieved by conjoining (just once) equations like i s = t with - here the i s are new 
existentially quantified variables and t is a representative E/-term in which only the original 
existentially quantified variables occur. In this way, all elements in a substructure generated 
by i are named explicitly and so are their ao-images ao[i] (otherwise said, modulo (f>k(i), for 
every E/(i)-term t, we have that ao[t] is equal to some of the «o[i])- 
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Now, in a locally finite theory, every quantifier free formula 9 having at most m free 
variables, is equivalent to a disjunction of diagram formulae 5m(G), where M. is a substruc- 
ture of a model of the theory and G is a set of generators for Ai of cardinality at most m{^] 
If we apply this to both Tj and Te, we get that our K(o,q) can be rewritten as 

V 3 *(Mi)Afe(a [i])) 

where A ranges over the m-generated models of Tj and B over the m-generated sub-models 
of Te (recall that Tj is closed under substructures). Every such pair (A,B) is either Af- 
inconsistent (in case some equality among the generators of A is not satisfied by the corre- 
sponding generators of B) or it gives rise to a configuration a such that 3i (5^(i) A <%(ao[i])) 
is precisely K a . □ 



The formula K s from Proposition |4.5[ i) will be called the diagram formula for the 
configuration s. 

The set B{r, K) of configurations which are backward reachable from the configurations 
satisfying a given 3^-formula K is thus an upset, being the union of infinitely many upsets; 
however, even when the latter are finitely generated, B(t, K) needs not be so. Under the 
hypothesis of local finiteness of Te, this is precisely what characterizes the termination of 
the backward reachability procedure. 

Theorem 4.6 ([37]). Assume that Te is locally finite; let K be an 3 1 -formula. If K is safe, 
then BReach in Figure^ terminates iff B(t, K) is a finitely generated upset^ 

Proof. Suppose that B(r, K) is a finitely generated upset. Notice that 

B(t,K) = U{BK'\t,K)1 

n 

consequently (since we have {BR°(t, K)j C {BR 1 ^,^} C \BR 2 (t,K)\ C •••) we have 
B(t,K) = \B R ti (t , K)\ = \BR n+1 {r, K)\ for some n, which means by the second claim 



of Proposition 42] that Af \= BR n (r,K) <-> BR n+1 (r, K), i.e. that the Algorithm halts. 
Vice versa, if the Algorithm halts, we have Af |= BR n (r,K) o BR n+1 (r, K), hence 
{BR n (T,K)] = \BR n+l {T,K)\ = B(t,K) and the upset B[t,K) is finitely generated by 
Proposition |4.5| □ 



To derive a sufficient condition for termination from the Theorem above, we use the 
notion of a wqo as in [1]. A pre-order (P, <) is a well- quasi- ordering (wqo) iff for every 
sequence 

Po,p 1 ,...,p i ,... (4.3) 
of elements from P, there are i < j with pi < pj. 

Corollary 4.7. BReach always terminates whenever the pre-order on Af -configurations is 
a wqo. 



Since the theory is locally finite, there are finitely many atoms whose free variables are included in a 
given set of cardinality m. Maximal conjunctions of literals built on these atoms are either inconsistent 
(modulo the theory) or satisfiable in an m-generated substructure of a model of the theory. Because of 
maximality, these (maximal) conjunctions must be diagrams. 

^If K is unsafe, we already know that BReach terminates because it detects unsafety. 
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Proof. It is sufficient to show that in a wqo all upsets are finitely generated. This is a 
well-known fact that can be proved for instance as follows. Let U be an upset. If U is 
empty, then it is finitely generated. Otherwise pick po E U, if fpo = U, clearly U is finitely 
generated; otherwise, let p\ E U\ t Po- At the [i + l)-th step, either U =\po U • • • U tPi and 
C7 is finitely generated, or we can pick pi + \ E U with pi + \ E"tpo U • • • U \pi. Since the last 
alternative sooner or later becomes impossible (because in an infinite sequence like (4.3), 
we must have pj E Ui<j tPi f° r some j), we conclude that U is finitely generated. □ 

Termination of backward reachability for some classes of systems (already considered 
in the literature) can be obtained from Corollary 4.7 some of these are briefly considered in 
the example below. Although decidable, many of these cases have very bad computational 
behavior as only a non- primitive recursive lower bound is known to exist. For the detailed 
formalization of the classes of systems mentioned in the example below, the interested reader 
is pointed to the extended version of |37j . 

Example 4.8. We consider three classes of systems for which decidability of the safety 
problem can be shown by using Corollary 4.7 and well-known results (such as Dickson's 
Lemma, Highman's Lemma, or Kruskal's theorem; see, e.g., [35J for a survey) for proving 
that the ordering on configurations is a wqo. 

• Take Te to be an enumerated data-type theory and Tj to be the pure theory of equality 
over the signature £j = {=}: the pre-order on ^^-configurations is a wqo by Dickson's 
Lemma. In fact, if Te is the theory of a finite structure with support {ei, . . . , e^}, a 
configuration is uniquely determined by a /c-tuple of integers (counting the number of the 
i for which a[i] = ej holds) and the configuration ordering is obtained by component- 
wise comparison. In this setting, one can formalize both cache-coherence |26| (see also 
Example 3.1) and broadcast protocols [331 [27] ■ 

• Take Te to be an enumerated data-type theory and Tj to be the theory of total order: the 
pre-order on ^^-configurations is a wqo by Higman's Lemma. In fact, if Te is the theory 
of a finite structure with support {ex, . . . , e^}, a configuration is uniquely determined by 
a word on {ei, . . . , e^} and the configuration ordering is simply the sub- word relation. In 
this setting, one can formalize Lossy Channel Systems [51 150] . 

• Take Te to be the theory of rationals (with the standard ordering relation <) and Tj 
to be the pure theory of equality over the signature S/ = {=}: the pre-order on Af- 
configurations is a wqo by Kruskal's theorem. In fact, we can represent a configuration 
(s,A4) as a list m,...,nk of natural numbers (of length k): such a list encodes the 
information that se is a fc-element chain and that n\ elements from si are mapped by s 
into the first element of the chain, ri2 elements from sj are mapped by s into the second 
element of the chain, etc. If w is the list for s and v is the list for s' , we have s' < s 
iff w is less than or equal component-wise to a sub-word of v. Termination by Kruskal's 
theorem is obtained by representing numbers as numerals and by using a binary function 
symbol / to encode the precedence (thus, for instance, the list 1,2,2 is represented as 
/(succ(0), f (succ(succ(0)) , succ(succ(0))))); it is easily seen that, on these terms, the 
homeomorphic embedding [TU] behaves like our configuration ordering. 

A final remark is in order. In the model checking literature of infinite state systems, an 
important property is that of 'monotonicity' p] (in an appropriate setting, this property is 
shown to be equivalent to the fact that the pre-image of an upset is still an upset). Such 
a property is not used in the proofs above as we work symbolically with definable upsets. 
However, it is possible to formulate it in our framework as follows: 
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- if (s,M), (s',M'), and (t,M) are configurations such that s < s' and M. |= r(s,t), 
then there exists (t',M') such that t < t' and M' \= r(s',t'). 
The proof that such a property holds for transitions in the format ( |3.1[ ) is easy and left as 
an exercise to the reader (it basically depends on the fact that truth of existential formulae 
is preserved by superstructures). 

5. Invariants Search 

It is well-known that invariants are useful for pruning the search space of backward 
reachability procedures and may help either to obtain or to speed up termination. 

5.1. Safety Invariants. First of all, we recall the basic notion of safety invariant. 

Definition 5.1. The V^-formula J(a) is a safety invariant for the safety problem consist- 
ing of the array-based system S = (a, /, r) and unsafe 3 -formula U(a) iff the following 
conditions hold: 

(i) Af |= Va(/(a) -> J(a)), 

(ii) Af |= VaVa'(J{a) A r(a, a') -»• J(a')), and 

(iii) 3a. (U (a) A J (a)) is ^4^-unsatisfiable. 

If we are not given the 3^-formula U(a) and only conditions (i)-(ii) hold, then J(a) is said 
to be an invariant for S. 

Checking whether conditions (i), (ii), and (iii) above hold can be reduced, by trivial 
logical manipulations, to the ^^-satisfiability of 3^'^V^-formulae, which is decidable by 
Theorem 3.3 So, establishing whether a given V^-formula J(a) is a safety invariant can be 
completely automated. 

Property 5.2. Let U be an 3 1 -formula. If there exists a safety invariant for U , then the 
array-based system S = (a, /, r) is safe with respect to U. 

Proof. For reductio, suppose that there is a safety invariant for U and the array-based 
system S = (a,I,r) is not safe w.r.t. U. This implies that the formula 

I (oq) A r(a , at) A • ■ • A r(a„_i, a n ) A U (a n ) (5.1) 



is ^4^-satisfiable. By using (i) and (ii) in Definition 5.1 we derive that J(a n ) A U(a n ) is 



Ap-satisfiable, in contrast to (iii) in Definition 5.1 D 



Thus, if we are given a suitable safety invariant, Property 5.2 can be used as the basis of 
the safety invariant method, which turns out to be more powerful than the basic backward 
reachability procedure in Figure [T] (a). 

Property 5.3. Let the procedure BReach in Figure^a) terminate on the safety problem 
consisting of the array-based system S = (a,I,r) and unsafe formula U{a). If BReach 
returns (safe, B), then ~^B is a safety invariant for U . 

Proof. Suppose that BReach exits the main loop at the fc-th iteration by returning B; then 
B is Vi=o -P r e J (T, [7){^]the formula Pre k+1 (r, U) A ->B is Ap-unsatisfiable and the formulae 
/ A Pre*(r, U) (for i = 0, . . . , k) are also Af-unsatisfiable. The latter means that Af \= 



^Notice that the disjunction of 3 / -formulae is (up to logical equivalence) an 3 7 -formula, so B is itself an 
3 7 -formula. 
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Va(J(a) ->• -iS(o)); for i = (since Pre°(r, U) is f7), we also get that 3a. (17(a) A ->B(a)) 
is A^-unsatisfiable. To claim that —<B(a) is an invariant, we only need to check that 
Af |= VaVa'(-.5(a) A r(a,a') ^B(a')), i.e. that Af |= Va(Pre(r, 5(a)) -> 5(a)), which 
trivially holds since Pre(r,B) is Vi=i P rei ( T , U) and hence implies Pre k+1 (r, U) V B and 
consequently also £? (recall that Pre fc+1 (T, {/) A -i-B is -unsatisfiable) . □ 

The converse of Proposition |5.3| does not hold: there might be a safety invariant even 
when BReach diverges, as illustrated by the following example^] 

Example 5.4. We consider an algorithm to insert an element b[0] into a sorted array 
b[l], . . . , b[n] (this can be seen as a sub-procedure of the insertion sort algorithm). To 
formalize this, let £/ contain one binary predicate symbol S and one constant symbol and 
Tj be the theory whose class of models consists of the substructures of the structure having 
the naturals as domain, with interpreted in the obvious way, and S interpreted as the 
graph of the successor function. For the sake of simplicity, we shall use a two-sorted theory 
for data and two array variables: let Te be the two-sorted theory whose class of models 
consists of the single two-sorted structure given by the Booleans (with the constants T, _L 
interpreted as true and false, respectively) and the rationals (with the usual ordering relation 
<); the array variable a is a collection of Boolean flags and the array variable b is the sorted 
numerical array where b[0] should be inserted. The initial V^-formula is represented as 
follows: 

Vi (a[i] = _L -H i ^ 0) A Vii, i 2 (S(h,i 2 ) ->h = 0\/ b[ii] < b[i 2 ]), 

saying that the elements in the array 6, whose corresponding Boolean flag is set to false 
(namely, all except the one at position 0), are arranged in increasing order. The procedure 



can be formalized by using just one transition formula in the format 3.1 whose guard and 
global component are as follows: 

(t>L(h,i 2 ,a[ii],a[i 2 ]) := S(ii,i 2 ) A a[i\] = T A a[i 2 ] = 1 A b[ii] > b[i 2 ] 

F G (i 1 ,i2,a[ii},a[i2],b[ii],b[i 2 \,j) := if (j = h) then (T, b[i 2 ]} 

else if (j = i 2 ) then (T, b[ii\) 

else (a[j],b[j]), 

which swaps two elements in the array b if their order is decreasing and sets the Boolean 
fields appropriately (notice that Fq updates a pair of array variables whose first component 
is the new value of a and second component is the new value of b) . The obvious correctness 
property is that there are no two values in decreasing order in the array b if the corresponding 
Boolean flags do not allow the transition to fire: 

3ti, i 2 (5(ii, i 2 ) A ->(a[ii] = T A a[i 2 ] = _L) A b[h] > b[i 2 ]). (5.2) 

Unfortunately, BReach in Figure [l] (a) diverges when applied to (5.2). Fortunately, a safety 



invariant for (5.2) exists. This can be obtained as follows: run MCMT on the safety problem 



given by the disjunction of (5.2) and the formula 

3i,j.(S{i,j)Aa{i] = lAa\j} = T) (5.3) 
saying that two adjacent indexes have their Boolean flags set to _L and T, respectively. The 



problem is immediately solved by the tool: by Property 5.3, the formula describing the 



set of backward reachable states is a safety invariant for the safety problem given by the 
^More significant examples having a similar behavior can be found in the mcmt distribution. 
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disjunction of (5.2) and (5.3), hence a fortiori also for the safety formula (5.2) alone. In 
this case, formula (5.3) has been found manually; however, MCMT can find it without user 
intervention as soon as its invariant synthesis capabilities are activated by suitable command 
line options. The combination of automatic invariant search and backward reachability will 
be the main subject of Section [pTl] below. □ 



It is interesting to rephrase the conditions of Definition 5.1 in terms of configurations 
as this paves the way to characterize the completeness of our invariant synthesis method as 
will be shown below. 



Lemma 5.5. Let J be aM 1 -formula; the conditions (i), (ii), and (Hi) of Definition 
equivalent to the following three conditions on (sets of) configurations: 

[/] n {H\ = 

[Pre(r,H)}Q[ir\ 
PI Q {HI 

where H is the 3 1 -formula which is logically equivalent to the negation of J. 



5.1 



are 



45 
45 



Proof. For (5.4), we have: 

(i) of Def. [51] 45 Af \= Va.(J(a) -> 7(a)) 
-A/a. (7(a) — > J(a)) is Af-unsat. 
3a. (1(a) A -'J(a)) is Af-unsat. 45 
3a.(I(a) A H(a)) is Af-unsat. 45 {1} n {Hj = 0. 
For (5.5), we have: 

(ii) of Def. [5l] 45 Af |= Va, a! .(J (a) A r(a, a!) -> 7(a)) 
3a, a'.->(J(a) Ar(a,a) — > J(a')) is Af-unsat. 
3a, a .(J (a) A r(a, a) A -<J(a)) is Af-unsat. 
3a.(J(a) A 3a'.(r(a, a') A -iJ(a'))) is Af-unsat. 
3a.(J(a) A 3a .(t (a, a) A H(a'))) is Af-unsat. 
3a.(J(a) A Pre(T,H)(a)) is Af-unsat. 

Va.(^7(a) V -^Pre(T,H)(a)) 
= \/a.(H(a) V ^Pre(r, H)(a)) 
Af \=~ia.(Pre(T,H)(a) -> 77(a)) 4^ [PrefofT)] C [77]. 



Af h 
Af 



45 
45 
45 
45 
45 
45 



For (5.6), we have: 

(iii) of Def. [53] O 3a.(Z7(a) A J(a)) is Af-unsat. 45 
3a.->(->U(a) V ->7(a)) is Af-unsat. 45 
Va.(U(a) -> -<J(a)) 



Af 



Af |= Va. (17(a) -> 77(a)) 4^ [E/] C [77]. 



(5.4) 
(5.5) 
(5.6) 



□ 
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5.2. Invariant Synthesis. The main difficulty to exploit Property 5.2 is to find suitable 



V^-formulae satisfying conditions (i) — (iii) of Definition 5.1 Unfortunately, the set of V 7 - 
formulae which are candidates to become safety invariants is infinite. Such a search space 
can be dramatically restricted when Te is locally finite, although it is still infinite because 
there is no bound on the length of the universally quantified prefix. From a technical point 
of view, we need to develop some preliminary results. 

First, we give a closer look to the equivalence relation among configurations: we recall 
that s is equivalent to t (written s ~ t) iff s < t and t < s. 

Proposition 5.6. We have that s « t holds iff there are a -isomorphism fj, and d S^- 
isomorphism v such that such that the set-theoretical compositions of n with s and of s' with 
v are equal\^\ The situation is depicted in the following diagram: 

s) - — ^ — - s/ 

s' 



SE 



Proof. The implication is straightforward and thus we detail only in the following. 
The supports of si and of tj are finite, hence the existence of embeddings si—^i-ti — % si 
means (for cardinality reasons) that /Ui,/i2 are bijections, hence isomorphisms. Since the 
images of s and t are finite sets of generators for se and ig, respectively, we have embeddings 
se — > tE — > se mapping generators into generators: again, for cardinality reasons, u%,U2 
restrict to bijections among generators, which means that they are isomorphisms. D 

Definition 5.7. A basis for a finitely generated upset S (resp., for an zK-formula if) is a 
minimal finite set {s±, . . . , s n } such that S (resp., [if]) is equal to ysj. U • • • U fs n . 

It is easy to see that two bases for the same upset are essentially the same, in the sense 
that they are formed by pairwise equivalent configurations. Suppose in fact that {s\, . . . ,s n } 
and {s[, . . . , s' m } are two bases for the same upset. Then for every Sj there exists s'j such 
that s'j < s^, however, there is also Sk with Sk < s'j (because {s±, . . . ,s n } is a basis) and 
by minimality it follows that Sj = s&, which means that Sj and s'- are equivalent. Thus 
each member of a basis is equivalent to a member of the other (and to a unique one by 
minimality again) and vice versa; in particular, we also have that m = n. 

Lemma 5.8. Suppose Te is locally finite. A configuration s belongs to a basis for an 
3 1 -formula K iff s G [if] and for every s' (V < s and s' G \KJ) imply that s ~ s' . 

Proof. Let B be a basis for K and let also s G B, s' < s and s' G [if]; then s' is bigger than 
some configuration from B, which must be s, because elements from B are incomparable: 
s ~ s' follows immediately. Conversely, suppose that s G [if] and for every s' , s' < s 
and s' G [if] imply that s ~ s'. Since Te is locally finite, if has a basis B (this can 



^Notice that, since the image of s is a set of generators for sg, it is not difficult to see that v is uniquely 
determined from fi (i.e., given fi, there might be no v such that the square commutes, but in case one such 
exists, it is unique). Observe also that, if s comes from the finite index model M and t comes from the finite 
index model A/", the fact that s ss t holds does not mean that M and Af are isomorphic: their E/-reducts 
are Ej-isomorphic, but their E_B-reducts need not be Eg-isomorphic (only the E_E-substructures se and tE 
are E^-isomorphic). 
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be immediately deduced from Proposition |4.5[ ii)). We have b < s (and also s ~ b) for 
some b from B: it is now clear that we can get another basis for K by replacing in B the 
configuration b with s. O 

Our goal is to integrate the safety invariant method into the basic Backward Reacha- 
bility algorithm of Figure Qa) . To this end, we introduce the notion of 'sub- reachability.' 

Definition 5.9 (Subreachable configurations). Suppose Te is locally finite and let s be 
a configuration. A predecessor of s is any s' that belongs to a basis for Pre(r, K s ) (see 



Proposition 4.5 for the definition of K s ). Let s, s' be configurations: s is sub-reachable from 
s' iff there exist configurations sq, . . . , s n such that (i) so = s, (ii) s n = s' , and (iii) either 
Si-i < Si or Si-i is a predecessor of Si, for each i = 1, ...,n. If K is an zK-formula, s is 
sub-reachable from K iff s is sub-reachable from some s' taken from a basis of K. 

The following is the main technical result of this section. 

Theorem 5.10. Let Te be locally finite. If there exists a safety invariant for U, then there 
are finitely many Af -configurations s\, . . . , s& which are sub-reachable from U and such that 
-i(iT Sl V • • • V K Sk ) is also a safety invariant for U. 



Proof. Our goal is to replace an zK-formula H satisfying the three conditions of Lemma 5.5 
with an zK-formula L whose negation is still a safety invariant for U and whose basis is 
formed by configurations which are all sub-reachable from U. To this end, we consider a 
function ^(S) where S is an zK-formula such that [S 1 ] C {HJ: the function j(S) returns an 
zK-formula K ai V • • • V K an , where {a±, . . . , a n } C [ii] is a minimal set of configurations 
taken from a basis of H such that \S\ C|aiU-'-U |a n . (Notice that this implies that 
{oi, . . . ,a n } is a basis of 'y(S) and [5 1 ] C [7(5*)].^ 

Now, define the following sequence of zK-formulae L^. (i) Lq := j{U) and (ii) Lj+i := 



LjV7(Pre(r, Lj)). (The definition is well given because [LJ C [if] is a consequence of (5.6) 



and (5.5).) What remains to be shown is that the sequence becomes stable and its fix-point 
is the desired L, i.e. a safety invariant for U whose basis is formed by configurations which 
are sub-reachable from U. 

We first show, by induction on k, that every configuration b that belongs to a basis of 
Lfc is sub-reachable from U : 

• if k = 0, we have that {ai, . . . , a n } is a minimal set of configurations taken from a basis 
of H such that [Z7] C \a\ U • • • U \a n and b = aj for some j = 1, . . . , n. By minimality, 
there is s from a basis of U such that s $\a\ U • • -U t^j-iU t^j+iU t a n, which means 
that s Gt a j) that is a,- < s and a* = b is sub-reachable from [/. 

• Suppose now k = i + 1 > 0. A basis for Lj V 7(Pre(r, Lj)) is obtained by joining 
two bases — one for Lj and one for 7(Pre(r, Lj)) — and then by discarding non-minimal 
elements. As a consequence, if 6 is in a basis for then 6 is either in a basis for Li or 
in a basis for j(Pre(r, L^)) (or in both). In the former case, we just apply induction. If 
b is in a basis for j(Pre(T,Li)), the same argument used in the case k = shows that 
b < s for an s that belongs to a basis for Pre(r, Li). Now, if en, . . . , Cik t is a basis of 
Li, the formula Pre(r,Li) is A^-equivalent to the disjunction of the Pre(r, K Ci ) and 



l^There might be many functions 7 satisfying the above specification, we just take one of them. This 
can be done (by choice axiom) because, given S such that [S 1 ] C [H], there always exists a minimal set of 
configurations {a±, . . . , a n } taken from a basis of H such that [S 1 ] C \a\ U ■ ■ ■ U fa n (just take any basis for 
H and throw out configurations from it until minimality is acquired). 
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consequently s must be in a basis of one of the latter (that is, s is a predecessor of some 
Cij); since the Cjj are sub-reachable by induction hypothesis and b < s, the definition of 
sub-reachability guarantees that b is sub-reachable from U. 
The increasing chain 

becomes stationary, because at each step only configurations from a basis of H can be added 
and bases are (unique and) finite by definition. Thus, we have [Lj] = [Lj+i] for some i: let 
L be Li for such i. 

The fact that L is a safety invariant is straightforward: condition \I\ n [L] = follows 



from ^§ and the fact that [L] C [iT], whereas conditions [E/] C [L] and [Pre(r, L)] C [L] 
follow directly from the above definitions of Lq and Lj+i (we have {Uj C [7(J7)J = [Lo] S 
[LJ and for all i > 0, [Pre(r, Lj)] C [ 7 (Pre(r, Lj))] C C [L]). □ 

The intuition underlying the theorem is as follows. Let us call 'finitely representable' 
an upset which is of the kind fKj for some 3^-formula K and let B be the set of backward 
reachable states. Usually B is infinite and it is finitely representable only in special cases 
(e.g., when the configuration ordering is a wqo). Nevertheless, it may sometimes exist a set 
B' D B which is finitely representable and whose complement is an invariant of the system. 



Theorem 5.10 ensures us to find such a B' , if any exists. This is the case of Example 5.4 



where not all configurations satisfying (5.3) are in B and B must be enlarged to encompass 
such configurations too (only in this way it becomes finitely representable, witness the fact 
that backward reachability diverges). 



In practice, Theorem 5.10 suggests the following procedure to find the super-set B' . 
At each iteration of B Reach, the algorithm represents symbolically in the variable B the 
configurations which are backward reachable in n steps; before computing the next pre- 
image of B, non deterministically replace some of the configurations in a basis of B with 
some sub-configurations and update B by a symbolic representation of the upset obtained 
in this way. As a consequence, if an invariant exists, we are guaranteed to find it; otherwise, 
the process may diverge. Notice that (in the local finiteness hypothesis for Tg) the search 
space of the configurations which are sub-reachable in n steps is finite, although this search 
space is infinite if no bound on n is fixed. To illustrate, (5.3) in Example 5.4 contains 
some sub-reachable only configurations. This shows that sub- reachability is crucial for 
Theorem 15. 101 to hold. 

The algorithm sketched above can be refined further so as to obtain a completely 
symbolic method working with formulae without resorting to configurations. The key idea 
to achieve this is to rephrase in a symbolic setting the relevant notions concerning sub- 
reachability. However, this goal is best achieved incrementally as there are some subtle 
aspects to take care of. The starting point is the following observation. It is not possible 
to characterize the fact that a configuration (s,M) is part of a basis for an 3^-formula 
3i<p[i, a[{\) by using another 3^-formula (a universal quantifier is needed to express the 
suitable minimality requirement). Instead, we shall characterize by an zK-formula the fact 
that a tuple satisfying cp(i, a[i]) generates a submodel which is a configuration belonging to 



a basis (see Lemma 5.11 below). Notice that the simple fact that the tuple satisfies 4> is not 
sufficient alone: for instance, only pairs formed by identical elements satisfying a[i\} = a\i2\ 
generate a configuration in a basis (tuples formed by pairs of different elements are not 
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minimal). To generalize this, we introduce the following abbreviation: 

Min(cj),a,i) := <f>(i, a[i]) A f\ \<j)(ia, a[ia\) ->• J\ \J (ta = i) J (5.7) 

where 4>(i,a[{\) is a quantifier-free formula, t ranges over representative £/(i)-terms, and a 
ranges over the substitutions with domain i and co-domain included in the set of represen- 
tative £/(£)-terms. The following lemma gives a semantic characterization of Min((j),a,i). 

Lemma 5.11. Consider an 3 1 -formula K = 3|0(i, a[{\), an Af -model Ai, and a variable 
assignment a in Ai such that (.M,a) |= a[i]). We have that (At, a.) \= Min((p,a,i) iff 
the configuration s obtained by restricting a(a) to the T,j -substructure generated by the a(i) 's 
belongs to a basis of 

Proof. Suppose that (Ai,a) |= Min((p,a,i) (for simplicity, we shall directly call i, a the 

it is 



elements assigned by a to i, a, respectively). By Proposition |5,6| and Lemma 5.8 
sufficient to show the following. Consider s' < s such that s' G {KJ: we show that the 
embeddings /j,, v witnessing the relation s' < s and making the diagram 



Sj - si 



s' 



t 

s E ► — T . — - s E 



to commute are isomorphisms (in fact, it is sufficient to show only that /i is bijective, 
because the images of s' and s are Sg-generators and the square commutes). Without 
loss of generality, we can assume that fi is an inclusion; the domain of s' is then formed 
by elements of the form f(^> a ) for suitable (representative) S/(i)-terms t and the fact that 
s' £ {Kj means then that (Ai, a) |= (f>(io~, a[ia\) holds for a substitution a whose domain is 
i and whose range is contained into the set of those representative £/(£)-terms u such that 
is in the support of s'j. Since (A^,a) |= Min(4>,a,i) holds, for every i £ i there is a 



U: 



(Ma) 



representative Sj-(i)-term t such that (Ai, a) \= ta = i holds. The latter means that i is in 
the support of s'j, hence the inclusion fi is onto. 

Conversely, if s belongs to a basis of of K, then there is no s' < s is in {KJ, unless s' is 
equivalent to s, by Lemma [5^8] Suppose that (Ai, a) |= 4>(io~, a[ia]) holds for a substitution a 
whose domain is i and whose range is included into the set of representative E/(i)-terms. For 
reductio, suppose that (Ai, a) |= ta = i does not hold for some i E i and all representative 
S/(i)-terms i; we can restrict the array a to the ^-substructure given by the elements of 
the kind ta^ M,& \ thus getting a configuration s' < s such that s' G {K\. Since the finite 
support of s'j has smaller cardinality than the support of sj (because a(i) does not belong 
to it), we cannot have s' ~ s, a contradiction! □ 

12 To make the statement of the lemma precise, one should define not just s but also the finite index 
model where s is taken from. In detail, we take the Af-model Af whose Ej-reduct is the restriction of Mi 
to the E/-substructure generated by the a(i)'s and whose S_e-reduct is equal to Me- In this model, we can 
define the array s to be the res trict ion of a(a) to INDEX"^ C INDEX^. The pair (s,Af) is now a configuration 



in the sense defined in Section 5.2 
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Remark 5.12. We identify conditions under which it is trivial to compute Min((fr,a,i). 
Besides being an interesting observation per se, it will be used later in this section to 
illustrate simple and useful examples of the key notion of cover (see Example 5.17| below). If 



(as it often happens in applications) the signature £/ is relational and the formula (f)(i, a[i]) 
is differentiated, Min((j), a,t) is yl^-equivalent to (j)(i, a[i]): this is because only variable 



permutations can be consistently taken into consideration as the cr's in formula (5.7), so 
that the id's are precisely the i's. 

Corollary 5.13. Consider an 3 1 -formula K := 3i (f>(i, a[i\) and a configuration (s,A4); if 
s belongs to a basis for K , then (M, a) |= a[|]) — > Min{<p, a,|) holds for all a such that 
a(a) = s. 

Proof. If (M,a) |= (j)(i, a[i]), then the configuration s' obtained by restricting a(a) = s 
to the ^-substructure generated by the a(i) is equivalent to s by Lemma |5.8| and hence 



belongs to a basis of K. Thus Lemma 5.11 applies and gives (Ai, a) |= Min((j),a,i). I I 



The next step towards the goal of obtaining a completely symbolic method for mecha- 



nizing the result stated in Theorem 5.10 consists of finding a purely symbolic substitute of 



the function 7 used in the proof of Theorem 5.10 The following result is the key to achieve 
this. 

Proposition 5.14. Let Te be locally finite, K := 3i.(f)(i, a[{\) be an 3 1 -formula, and L be 
an 3 1 -formula. The following two conditions are equivalent: 

(i) for every s in a basis for K, there exists a configuration s' in a basis for L such that 
s < s'; 

(ii) L is (up to Af -equivalence) of the form 3i, j.ip(i, j, a[i], a\j\) for a quantifier-free for- 
mula ip and 

if Af \= Min(if>,a,ij) 9{t,a[t]) then Af \= Min((f>,a,i) 6(t,a[t]), 

for all quantifier free (S|;US; )-formula 9 and for all tuple of terms t taken from the 
set of the representative S/(z)-ierms. 

Proof. Assume (i). We first apply a syntactic transformation to L as follows. Let B,B' be 
bases for K, L, respectively; we know that for every (s,M s ) £ B there is (s L ,Mf) £ B' 
such that s < s : the relationship s < s is due to the existence of a pair of embeddings 
(fi s , v s ) as required by the configuration ordering definition. For every s G B and for every 
assignment a such that a(a) = s and (M. s , a) |= (p(i, a[i]), we build the diagram formula K & 
for s L given by 

3i3k(5 s L(i,k) A5 s L{a[i],a[k})) (5.8) 

where the variables k are names for the elements in the complement subset supp(sf)\id s (a.(i)) 
(here supp(sj) is the support of the E/-structure Sj). Noti ce that the formula (5.8) is noth- 



ing but formula (4.2) used in the proof of Proposition 4.5 i)J^] Since, for a configuration t, 



the fact that t £ {KaJ means that there are suitable embeddings witnessing that s < t, we 



have that |X] = [IV V a ^ a l' hence by Proposition 4.2 the formula L is Af -equivalent to 



^It might happen here that duplicate variables are used because the a(i) need not be distinct. This is 
not a problem: if different index variables (say 11,12) naming the same element are employed, the diagram 
formula will contain a conjunct like i\ — «2. The embedding property of Robinson Diagram Lemma is not 
affected by these duplications. 
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L V V a ^aj^j Up to logical equivalence, we can move the existentially quantified variables 
outside the disjunctions so that L is equivalent to a prenex existential formula of the kind 
3i3jip. With this new syntactic form, the following property holds: for every s £ B and for 
every assignment a such that a(a) = s and (M s ,a) |= 4>(i, a[{\), there is an assignment a L 
such that (i) (M^,a L ) |= ip(i,j, a[i],a[j]), (ii) a L (i) = /j, s (a(i)), and (hi) a L (a) = s L . Since 



s L is in a basis of L, from Corollary 5.13, it follows also that (Mg,a L ) |= Min(ip,a,ij). 
Suppose now that Af \/= Min(4>, a, i) — > 9(t(i), a[t(£)]); by Lemma 5.11 (and by the fact 



that 4>, 9 are quantifier-free) this means that there are a configuration (s, M s ) £ B and an as- 
signment a such that (M s , a) |= cp(i, a[i]) and (M s , a) \f= 9(t, a[t]). Since 9 is quantifier-free, 
taking the assignment a L satisfying (i)-(ii)-(iii) above, we get that (M^,a L ) y= 9(t, a[t\), 
thus also (M^,a L ) ^ Min(ip,a,ij) -> 0(i,a[i]). 

Conversely, assume (ii). Fix (s, ,M S ) in a basis -B for If and an assignment a such 



that (M s , a) |= cj)(i, a[i]); by Corollary 5.13, we have that (M s , a) |= Min((f),a,i). Let 



i be the representative S/(i)-terms and let 0(t(i), a[t(i)]) be the negation of the formula 
£s/(£(D) Ad SE (a[t(i)]). We have (.M s ,a) ^ Min((j),a,i) — > 9(t,a[t]), hence there are jV and 
b such that (M, b) Min(tp, a,ij) — > 9(t, a[t]). By restricting the support of A/} if needed, 
we can suppose that M is a finite index model and that JVj is generated by the elements 
assigned by b to the Let s' be b(o): from Lemma 



5.11 



it follows that s is in a basis for 
L; also, from the fact that (M, b) a[i]), we can conclude that s < s', as desired. D 

In the following, we will write K < L whenever one of the (equivalent) conditions in 
Proposition |5.14| holds. We show that, under the working assumption that Te is locally 
finite, it is possible to compute all the finitely many (up to -equivalence) 3^-formulae K 
such that K < L. 

Proposition 5.15. LetTE be locally finite. Given an 3 1 -formula L, there are only finitely 
many (up to Af -equivalence) 3 1 -formulae K such that K < L and all such K can be 
effectively computed. 

Proof. Suppose that L is of the form 3krf. To use the criterion of Proposition 5.14[ ii) in an 



effective way, we only need to find a bound for the length of the tuples i and j. In fact, once 
the bound is known the search space for formulae of the forms 3i 3j ijj and 3i (j) satisfying 



the conditions (which can be effectively checked by using Theorem 3.3) 

Af |= 3hy 3i 3^, and for all 9(t(i), a[t(i)]) 

Af \= Min(ip,a,ij) -)• 9(t,a[t]) Af \= Min(<f>,a,i) -> 9(t,a[t]) 

is finite. This is because Tj and Te are both locally finite and hence, there are only 
finitely many quantifier-free formulae of the required type invol ving a fixed number of index 



variables which are not j4f-equivalent. The proof of Proposition 5.14 shows that the lengths 
of i and j are both bounded by the maximum cardinality N of the support of sj, where si 
is a configuration that belongs to a basis for L = 3 k 7. For j, this is clear from the proof 
itself while for i, it is a consequence of the following considerations. First, we can restrict 
the search to formulae K of the form 3i<j), where the length of i is minimal, i.e. K is not be 



equivalent to a formula with a shorter existential prefix. Furthermore, by Proposition 4.5 



K is equivalent to K Sl V • • • V K Sn , where {s\, . . . , s n ] is a basis for K. In turn, by (4.2), 



this means that there must exist a configuration t in a basis for K such that the cardinality 



The assignments are infinite, but only finitely many variables are mentioned in them, so that only 
finitely many formulae can be produced. 
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of tj is bigger than or equal to the length of i; since t < s for some s in a basis for L 
(see Proposition 5.14[ ii)), we have that the length of i cannot exceed N. To conclude, 



it is sufficient to observe that N cannot be bigger than the number of the representative 
£/(/c)-terms. □ 

Definition 5.16. We say that K covers L iff both K < L and Af \= L — > K. 

The following example illustrates the notions just introduced and will be useful also 



when discussing the implementation of our invariant synthesis technique (see Section 6.2 
below) . 

Example 5.17. Let £/ be relational and Te be a locally finite theory admitting elimination 
of quantifiers. Let 

L := 3ii.(>ip E {a\i),a\i}) Aipi(h0 Atfj(i)) (5.9) 

be a primitive differentiated and yl^-satisfiable zK-formula such that (i) inj = 0, (ii) 
ip E (§.,d) is a conjunction of S^-literals; (iii) ipi(i,j) is a conjunction of £j-literals; (iv) <5/(i) 
is a maximal conjunction of S/(z)-literals (i.e. for every £(i)-atom A(i), 5i contains either 
A(i) or its negation). If 

K := 3i(5/(|) A0 E (a[i])), (5.10) 

where 4>e(§} is T^-equivalent to 3d tpE (g ; d) p*| then K covers L and in particular K < L. 
We prove this fact in the following. 



Proof. We use Proposition 5.14[ ii): as shown in Remark 5.12, since L and K are differ 



entiated, we can avoid mentioning the corresponding formulae Min in the condition of 



Proposition 5.14 ii) and just prove that 

Af ¥= Sj(i) A 4>E{am -> 9(i, a[i}) => 

Af y= AMhf) A ipE(a\i],a\i]) -> e(i,a[i}) 

for every 9 (notice that, since X/ is relational, the only £/(i)-terms are the i). Pick a model 
A4 and an assignment a such that (7W,a) |= 5[(i) A <f>E{o{{\) and (A^,a) ^= 0(i, a[i]). We 
can freely assume that that the support of Aij is a S/-structure generated by the a(i); 
by modifying the value of a on the element variables d, if needed, we can also assume 
that (.M,a) |= ipE(a[i],d) (this is because 4> E {(L) is T^-equivalent to 3dipE{e, d)). Since L 
is consistent, there are also a model N and an assignment b such that (AA, b) |= 5i(i) A 
ipl(i, j) A ^s(a[i], a\j]). Again, we can assume that the support of Mi is a S/-structure 
generated by the a(i, j); since 5i(i) is maximal, it is a diagram formula, hence (up to an 
isomorphism) Aij is a substructure of A/"/. Let us now take the model Af' , whose £/-reduct 
is Afi and whose S^-reduct is AAe- Let b' be the assignment which is like b as far as the 
index variables i, j are concerned and which associates with the variable a the array whose 
b'(l)- values are the b'(i) = a(i)-values of a(a) and whose b'(j)-values are the d (notice that 
this is correct because by differentiatedness of L the b'(ij') are all distinct). It turns out 
that (Af', b') y= Si(i) A ipi(i,j) A ipE(a\i\,a\j]) -} 6(i, a[i}), as desired. □ 



^4>e is guaranteed to exist as Te admits elimination of quantifiers. 
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We are now in the position to take the final step towards the goal of obtaining a com- 



pletely symbolic method for restating the results from Theorem 5.10 Let ChooseCover(L) 
be a procedure that returns non-deterministically one of the zK-formulae K such that K cov- 



ers L (this procedure is playing the role of the function 7 from the proof of Theorem 5.10). 
We consider the procedure Slnv in Figure [l] (b) for the computation of safety invariants and 
prove its correctness. 

Theorem 5.18. Let Te be locally finite. Then, there exists a safety invariant for U iff the 
procedure Slnv in Figure^ (b) returns a safety invariant for U, for a suitable ChooseCover 
function. 

Proof. Suppose that Slnv returns B after k + 1 iterations of the loop: we show that —>B is 
a safety invariant. Notice that B is a disjunction Po V • • • V P& of zK-formulae such that for 
alii = 0, . . . , k, 

(I) : the formula I A Pi is not .Af'-satisfiable; 

also Pi covers Pre(r, Pi-i) and Pq covers U, which means in particular that 

(II) : Af |= Va (Pre(r, P<_i)(a) -> P(a)) and Af |= Va (P(a) -> P (a)). 

Finally, Slnv could exit the loop because for some Pk+i covering Pre(r, Pk), it happened 
that Pfc+i A ->B was not j4;f-satisfiable: these two conditions entail that 

(III) : Af |= Va (Pre(r, P fc )(a) -> P(a)). 



Conditions (i) and (iii) of Definition 5.1 now easily follows from (I) and (II); we only 



need to check condition (ii) of Definition 5.1, namely (up to logical equivalence) that 
Af \= Va(Pre(T,B)(a) -> B(a)): since Pre(r, B) is logically equivalent to the disjunc- 
tion ViLo P^e(r, Pi), the claim follows immediately from (II)-(III). 

Let us now prove the converse, i.e. that in case a safety invariants exists, Slnv is able to 



compute one. Recall the proof of Theorem 5.10 given the negation H of a safety invariant 



for U, another negation L of a safety invariant for U is produced in the following way. Define 
the sequence of B^-formulae Li as follows: (i) Lq := j(U) and (ii) Lj+i := LiVj(Pre(T, Li)). 
Our L is the Li with the smallest i such that Li+i is A^-equivalent to Li (the proof of 



Theorem 5.10 guarantees that such an i exists). 

The above recursive definition for Lj is based on the function 7, which is defined (non 
symbolically) by making use of configurations. Actually, for an 3^-formula 5 such that [5] C 
[P], the function 7(5) returns an EK-formula K ai V • • • V K an , where {ax, . . . , a n } C [P] is 
a minimal set of configurations taken from a basis of P such that [<S] Cf a\ U • • • U t Q n- 



Using Proposition 5.14, it is not difficult to see that minimality implies j(S) < S 1 : in fact, 
condition [S 1 ] C fai U • • • U \a n says that for every s in a basis for 5 there is in the basis 
{ai, . . . , a n } for 7(6') such that a« < s, but the converse (which is what really matters for us 



in view of Proposition 5.14[ i)) must hold too, by minimality. This can be shown as follows: 



if any ai is eliminated, the relation 

[S\ C tax U • • • U foi-iU t«i+i U • • • U fan 

does no longer hold, hence there is an s from a basis of S such that dj ^ s for all j = 
1, . . . , i — 1, i + 1, . . . n. Since, on the contrary, [S 1 ] C|ai U •••U f a„ holds, we must 
conclude that Oj < s. Hence for every aj there is an s in a basis of S such that a^ < s. 

Thus 7(5) is such that 7(5) < 5 and Af |= 5 — )■ 7(5), i.e. 7(5) covers 5. It is then 
clear that an appropriate choice of the function ChooseCover in Slnv can return precisely 
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the formulae Li so that they are assigned to the variable B at the ith-loop of the procedure, 
thus justifying the claim of the Theorem. O 

When ChooseCover(L) = L, i.e. ChooseCover is the identity (indeed, L covers L), the 
procedure Slnv is the (exact) dual of BReach in Figure [l] (a) and, hence it can only return 
(the negation of) a symbolic representation of all backward reachable states as a safety 
invariant. 



6. Pragmatics of Invariant Synthesis and Experiments 

The main drawback of algorithm Slnv (in Figure [I] (b), explained in the last section) is 
the non determinism of the function ChooseCover. Although finite, the number of formulae 
covering a certain zK-formula is so large to make any concrete implementation of Slnv 
impractical. Instead, we prefer to study how to integrate the synthesis of invariants into 
the backward reachability algorithm of Figure [l] (a). Given that finding a safety invariant 
could be infeasible through an exhaustive search, we content ourselves to find invariants 
tout court and use them to prune the search space of the backward reachability algorithm 
BReach (in Figure [l] (a)). 

6.1. Integrating Invariant Synthesis within Backward Reachability. In our sym- 
bolic framework, at the n-th iteration of the loop of the procedure BReach, the set of 
backward reachable states is represented by the formula stored in the variable B (which 
is equivalent to BR n (r,U)). So, 'pruning the search space of the backward reachability 
algorithm' amounts to disjoining the negation of the available invariants to B. In this way, 
the extra information encoded in the invariants makes the satisfiability test at line 2 (for 
fix-point checking) more likely to be successful and possibly decreases the number of itera- 
tions of the loop. Indeed, the problem is to synthesize such invariants. Let us consider this 
problem at a very abstract level. 

Suppose the availability of a function Choose that takes an zK-formula P and returns 
a (possibly empty) finite set S of 3 7 -formulae representing 'useful (with respect to P) 
candidate invariants.' We can integrate the synthesis of invariants within the backward 
reachability algorithm by adding between lines 4 and 5 in Figure [I] (a) the following in- 
structions: 

4' foreach CINV G Choose(P) do 

if BReach(C77Vl/) = (safe, P c/W ) then B < — B V ->B C inV, 
where CINV stands for 'candidate invariant.' The resulting procedure will be indicated 
with BReach+lnv in the following. Notice that BReach is used here as a sub-procedure of 
BReach+lnv. 

Proposition 6.1. If the procedure BReach+lnv terminates by returning safe (imsafej, then 
S is safe (unsafe) with respect to U. 

Proof. The claim is trivial when BReach+lnv returns unsafe. Let us consider the situation 
when the procedure terminates by returning safe at the (k + l)-th iteration of the main 
loop. Observe that the content of the variable B is 

Pre°(r, U) V Pre^r, U) V • • • V Pre fc (r, U) V Hi V • • • V H m (6.1) 
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at the (k + l)-th iteration of the loop, where H\, . . . ,H m are negations of invariants (see 
Property |5.3|). For reductio, suppose that the system is unsafe, i.e. for some n > 0, the 



formula (3.2) (shown here for the sake of readability) 

I{a n ) A r(a n , a n _i) A • • • A r(ai, a ) A U (a ) 

is Ap-satisfiable. Assume that the formula is true in a model of Af with the array assign- 
ments s n ,...,so; in the following, we say that s n , . . . , so is a bad trace. We also assume that 
s n , . . . , so is a bad trace of shortest length. Since the formulae I A Pre (r, U), I A Pre 1 (r, [/), 
. . . , and I APre k (r, U) are all ^4f-unsatisfiable (see line 3 of Figure [l] (a) , which is also part 
of BReach+lnv), it must be n > k. Let us now focus on s^+i', since BReach+lnv returned 
safe at iteration k + 1, it must have exited the loop because the formula currently stored in 
P (which is Pre k+1 (r, U)) is not Af-satisfiable with the negation of the formula currently 
in B (which is (6.1)). Hence, Sfc+i (which satisfies Pre k+1 (r,U)) must satisfy either some 



Pre 1 (for I < k + 1) or some Hi, but both alternatives are impossible. In fact, the former 
would yield a shorter bad trace, whereas the latter is in contrast to the fact that Sk+i is 
forward reachable from a state satisfying I and, as such, it should satisfy the invariant 
--Hi. ' ' □ 

The procedure BReach+lnv is 

• incomplete, in the sense that it is not guaranteed to terminate even when a safety invariant 
exists, 

• deterministic, since no backtracking is required, 

• highly parallelizable: it is possible to run in parallel as many instances of BReach as 
formulae in the set returned by Choose, and 

• it performs well (for appropriate Choose functions, see below for a discussion of the 
meaning of "appropriate" in this context) as witnessed by the experiments in the next 
section. 

As a result, invariant synthesis becomes a powerful heuristic within a refined version of 
the basic backward reachability algorithm. Furthermore, its integration in the tableaux 



calculus of Section 3.3 is particularly easy: just use the calculus itself with some bounds on 
the resources (such as a limit on the depth of the tree) to check if a candidate invariant is a 
"real" invariant. Indeed, the crucial point is how to design an appropriate function Choose. 
There are several possible criteria leading to a variety of implementations for Choose. The 
usefulness of the resulting functions is likely to depend on the application. Despite the 
complexity of the design space, it is possible to identify a minimal requirement on Choose 



by taking into account the tableaux calculus introduced in Section 3.3 To this end, recall 
that backward reachable sets of states are described by primitive differentiated formulae and 
that a formula P representing a pre-image is eagerly expanded to disjunctions of primitive 
differentiated formulae by using the Beta rule. Thus, a reasonable implementation of Choose 
should be such that Choose(P) = S where S is a set of primitive differentiated formulae 
such that each Q' G S is implied by a disjunct Q occurring in the disjunction of primitive 
differentiated formulae obtained as expansion of P. In this way, each Q' £ S can be seen 
as a tentative over- approximation of Q. (Notice that guessing a candidate invariant can be 
seen as a form of abstraction.) All the implementations of the function Choose in mcmt 
satisfy the minimal requirement above and can be selected by appropriate command line 
options and directives to be included in the input file (the interested reader is pointed to 
the user manual available in the distribution for details). We now describe two types of 
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abstractions that lead to different implementations of the function Choose that are available 
in the current release of mcmt. 



6.2. Index Abstraction. Index abstraction amounts to eliminating some index variables; 
if done in the appropriate way, this is equivalent to replacing configurations with sub- 
configurations (as discussed in Section [5]). Thus, it is possible to design approximations 
(quite loose, but suitable for implementation) of the procedure suggested in the proof of 
Theorem 5.18 An idea (close to what is implemented in the current release of mcmt) is to 
follow the suggestions in Example 5.17 so as to satisfy the minimal requirement discussed 
above on Choose. More precisely, given Q := 3k.9(k, a[k\), we first try to transform it into 
the form of (5.9), i.e. 

3£j • (a [i], a[j]) A j) A Sj(i)). 

To do this, we decompose k into two disjoint sub-sequences i and j such that k = £ U j 
according to some criteria: if the conjunction of X/(£) literals occurring in 9 is maximal, we 
get a candidate invariant by returning the corresponding zK-formula (5.10), i.e. 

31(<5/(i) A (^(a [£])). 

This is computationally feasible in many situations. For example, quantifier elimination 
reduces to a trivial substitution if Tg is an enumerated data-type theory and the £e- 
literals in 9 (i.e. those in tp^) are a h positive. The maximality of 9 is guaranteed (by being 
differentiated) if Tj is the theory of finite sets. Another case in which maximality of 9 is 
guaranteed is when Tj is the theory of linear orders and i = i\ or (£ = £i, £2 and 9 contains 
the atom i\ < 12). In more complex cases, it is possible to obtain a useful formula (similar 
to (5.10)) in a purely syntactic and computationally cheap way. There is no risk in using 
methods giving very coarse approximations since a candidate invariant is used for pruning 
the search space of the backward reachability procedure only if it has been proved to be a 
"real" invariant (see also Remark |6.2| below) . 



6.3. Signature Abstraction. Index abstraction can be useless or computationally too ex- 
pensive (if done precisely) in several applications. Even worse, when Tg is not locally finite, 
the related notion of sub-configuration loses most of its relevance. In these cases, other 
forms of abstraction inspired to predicate abstraction |44j may be of great help. Although 
predicate abstraction with refinement (as in the CEGAR loop) is not yet implemented in 
mcmt, it features a technique for invariant synthesis that we have called signature abstrac- 
tion, which can be seen as a simplified version of predicate abstraction. This technique uses 
quantifier elimination (whenever possible) to eliminate the literals containing a selected 
sub-set X of the set of array variables. The subset X can either be suggested by the user 
or dynamically built by the tool from the shape of the disjunct belonging to the pre-image 
being currently computed. Again, the elimination is applied to each of the primitive dif- 
ferentiated disjuncts of the currently computed pre-image P to obtain the differentiated 
formulae to form the set of formulae returned by Choose. It is easy to see that this way of 
implementing the function Choose satisfies the minimal requirement discussed above. 

Remark 6.2. The reader may wonder whether the use of abstraction techniques can have 
a negative impact on the correctness of MCMT outcome. We emphasize that this is not the 
case because of the way the candidate invariants are used to prune the search space during 
backward reachability. In fact, abstraction is just to generate the candidate invariants 
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which are then tested to be "real" invariants by a resource bounded version of backward 
reachability. Only if candidate invariants pass this test, they are used to prune the search 
of backward reachable states. In other words, the answer supplied by mcmt to a safety 
problem is always correct: as it is clear from the proof of Proposition 6.1 the set of backward 
reachable states can be augmented if invariants are used during backward search, but it is 
augmented by adding it only states satisfying the negation of an invariant (these states are 
not forward reachable, hence they cannot alter safety checks). As a consequence, safety 
tests remain exhaustive, although it may happen that resources (such as computation time) 
are wasted in checking candidate invariants that turn out not to be "real" invariants or not 
to be useful to significantly prune the search space. 



6.4. Experiments. To show the flexibility and the performances of mcmt, we have built 
a library of benchmarks in the format accepted by our tool by translating from a variety of 
sources safety problems. More precisely, our sources were the following: 



parametrised systems from the distribution of the infinite model checkers pfs (http: 



//www. it .uu. se/research/docs/fm/apv/tools/pf s) and Undip (http://www.it.uu. 
se/research/docs/fm/apv/tools/undip), 



parametrised and distributed systems from the invisible invariant methods (see, e.g., |12j). 
imperative programs manipulating arrays (such as sorting or string manipulation) taken 
from standard books about algorithms, 

imperative programs manipulating numeric variables from the distribution of the model 
checker ARMC (|http : //www7 . in . turn . de/~rybal/armc ) , 



protocols from the distribution of Mur^ extended with predicate abstraction (http:// 
verify . Stanford. edu/ satyaki/research/PredicateAbstractionExamples . 



html) 



We did not try to be exhaustive in the selection of problems but rather to pick problems from 
the wider possible range of different classes of infinite state systems so as to substantiate 
the claim about the flexibility of our tool. All the files in mcmt format are contained in 
the mcmt distribution which is available at the tool web page ( |http : //homes . dsi . unimi . | 



it/~ghilardi/mcmt). Each file comes with the indication of source from which it has been 



adapted and a brief informal explanation about its content. 

We divided the problems into four categories: mutual exclusion and cache coherence 
protocols taken mainly from the distributions of pfs and Undip (see Tables [I] and [2J, 
imperative programs manipulating arrays (see Table [3]) , and heterogeneous problems (see 
Table [4]) taken from the remaining sources listed above. For the first two categories, the 
benchmark set is sufficiently representative, whereas for the last two categories just some 
interesting examples have been submitted to the tool. For each category, we tried the tool 
in two configurations: one, called "Default Setting," is the standard setting used when 
mcmt is invoked without any option and the other, called "Best Setting," is the result of 
some experimentation with various heuristics for invariant synthesis, signature abstraction, 
and acceleration. It is possible that for some problems, the "real" best setting is still to be 
identified and the results reported here can be further improved. 

In Tables [TJ [2j [3j and|4j the column 'd' is the depth of the tableaux obtained by applying 
the rules listed in Section pT3l '#n' is the number of nodes in the tableaux, '#d' is the number 
of nodes which are deleted because they are subsumed by the information contained in the 
others (see |40| for details about this point), '^SMT' is the number of invocations to Yices 



36 S. GHILARDI AND S. RANISE 



Table 1: Mutual exclusion protocols 





Default Setting 


Best Setting 


Problem 


d 


#n 


#d 


#SMT 


time 


d 


#n 


#d 


#SMT 


#inv. 


time 


Bakery 


2 


1 





6 


0.00 


2 


1 





6 





0.00 


Bakery _bogus 


8 


90 


14 


1413 


0.81 


8 


53 


4 


1400 


7 


0.68 


Bakery _e 


12 


48 


17 


439 


0.20 


7 


8 


1 


213 


16 


0.10 


Bakery .Lamport 


12 


56 


15 


595 


0.27 


4 


7 


1 


209 


7 


0.08 


Bakery _t 


9 


28 


5 


251 


0.11 


7 


8 


1 


134 


5 


0.06 


Burns 


14 


56 


7 


373 


0.14 


2 


2 


1 


53 


3 


0.02 


Dijkstra 


14 


122 


37 


2920 


2.11 


2 


1 


1 


215 


12 


0.08 


Dijkstral 


13 


38 


11 


222 


0.10 


2 


1 


1 


35 


2 


0.02 


Distrib_Lamport 


23 


913 


242 


47574 


120.62 


23 


248 


42 


19254 


7 


32.84 


Java M-lock 


9 


23 


2 


289 


0.10 


9 


23 


2 


289 





0.10 


Mux_Sem 


7 


8 


2 


57 


0.02 


2 


1 


1 


65 


6 


0.02 


Rickart _Agrawala 


13 


458 


119 


35355 


187.04 


13 


458 


119 


35355 





187.04 


Sz_fp 


22 


277 


3 


7703 


5.12 


22 


277 


3 


7703 





5.12 


Sz_fp_ver 


30 


284 


38 


10611 


6.66 


30 


284 


38 


10611 





6.66 


Szymanski 


17 


136 


10 


2529 


1.60 


9 


14 


5 


882 


12 


0.30 


SzymanskLat 


23 


1745 


311 


424630 


540.19 


9 


22 


10 


2987 


42 


1.25 


Ticket 


9 


18 





284 


0.17 


9 


18 





284 





0.17 



during backward reachability to solve fix-point and safety checks, '#inv.' is the number 
of invariants found by the available invariant synthesis techniques (see also |39| for a more 
in-depth discussion on some of these issues) j^j and 'time' is the total amount of time (in 
seconds) taken by the tool to solve the safety problem. Timings were obtained on a Intel 
Centrino 1.729 GHz with 1 Gbyte of RAM running Linux Gentoo. In some cases, the 
system seemed to diverge as it clearly entered in a loop: it kept applying the same sequence 
of transitions. In these cases, we stopped the system, left the corresponding line of the table 
empty, and put 'timeout' in the last column. 

As it is apparent by taking a look at the Tables, gaining some expertise in using the 
available options of the tool may give dramatic improvements in performances, either in 
terms of reduced timings or in getting the system to terminate. For the category "Mutual 
exclusion protocols," invariant synthesis is helpful to reduce the solving time for the larger 
examples. For the category "Cache coherence protocols," the effect of invariant synthesis 
as well as other techniques is negligible. For the category "Imperative programs," invariant 
synthesis techniques are the key to make the tool terminate on almost all problems. In 
particular, signature abstraction, introduced in this last version of the tool, is a crucial 
ingredient. 

A comparative analysis is somewhat difficult in lack of a standard for the specifications 
of safety problems. This situation is similar to the experimental evaluation of SMT solvers 
before the introduction of the SMT-LIB standard [53] . It would be interesting to investigate 
if the proposed format can become the new interlingua for infinite state model checkers so 
that exchange of problems becomes possible as well as the fair comparison of performances. 
Just to give an idea of the relative performance of our tool, we only mention that mcmt 



16 In the table for the "Default Setting," the column labelled with '#inv.' is not present because mcmt's 
default is to turn off invariant synthesis. 
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Table 2: Cache coherence protocols 





Default Setting 


Best Setting 


Problem 


d 


#n 


#d 


#SMT 


time 


d 


#n 


#d 


#SMT 


#inv. 


time 


Berkeley 


2 


1 





16 


0.00 


2 


1 





16 





0.00 


Futurebus 


8 


37 


3 


998 


0.96 


8 


37 


3 


998 





0.96 


German07 


26 


2442 


576 


121388 


145.68 


26 


2442 


576 


121388 





145.68 


German_buggy 


16 


1631 


203 


41497 


49.70 


16 


1631 


203 


41497 





49.70 


German_ca 


9 


13 





62 


0.03 


9 


13 





62 





0.03 


German_pfs 


33 


11605 


2755 


858184 


31m 01s 


33 


11141 


2673 


784168 


149 


30m 27s 


Illinois 


4 


8 





144 


0.08 


4 


8 





144 





0.08 


Illinois_ca 


3 


3 


1 


48 


0.02 


3 


3 


1 


48 





0.02 


Mesi 


3 


2 





9 


0.00 


3 


2 





9 





0.00 


MesLca 


3 


2 





13 


0.00 


3 


2 





13 





0.00 


Moesi 


3 


2 





10 


0.01 


3 


2 





10 





0.01 


MoesLca 


3 


2 





13 


0.00 


3 


2 





13 





0.00 


Synapse 


2 


1 





16 


0.01 


2 


1 





16 





0.01 


Xerox P.D. 


7 


13 





388 


0.23 


7 


13 





388 





0.23 



performs better or outperforms (on the largest benchmarks) the model checkers PFS and 
Undip on the problems taken from their distributions. In addition, these two systems are 
not capable of handling many of the problems considered here such as those listed in the 
category "Imperative Programs" (their input syntax and the theoretical framework they 
are based on are too restrictive to accept them). 

7. Discussion 

We have given a comprehensive account of our approach to the model checking of safety 
properties of infinite state systems manipulating array variables by SMT solving. The idea 
of using arrays to represent system states is not new in model-checking (see in particu- 
lar |55| [5l] ) ; what seems to be new in our approach is the fully declarative characterization 
of both the topology and the (local) data structures of systems by using theories. This 
has two advantages. First, implementations of our approach can handle a wide range of 
topologies without modifying the underlying data structures representing sets of states. 
This is in contrast with recently developed techniques [2 [3] for the uniform verification 
of parametrized systems, which consist in exploring the state space of a system by using a 



Table 3: Imperative Programs 





Default Setting 


Best Setting 


Problem 


d 


#n 


#d 


#SMT 


time 


d 


#n 


#d 


#SMT 


#inv. 


time 


Find 


4 


27 


7 


691 


0.90 


4 


27 


7 


691 





0.90 


Max_in_Array 










timeout 


2 


1 


1 


46 


5 


0.03 


Selection_Sort 










timeout 


5 


13 


2 


1141 


11 


0.62 


Strcat 










timeout 


2 


2 


2 


80 


2 


0.07 


Strcmp 










timeout 


2 


1 


1 


21 


3 


0.01 


Strcopy 


3 


3 


1 


694 


1.22 


3 


3 


2 


564 


4 


0.38 



38 S. GHILARDI AND S. RANISE 



Table 4: Miscellanea 





Default Setting 


Best Setting 


~d — wi 

r roblem 


d 






#SMT 


time 


d 


#n 


#d 


#SMT 


#inv. 


time 


Alternating-bit 


- 


- 


- 


- 


timeout 


21 


1008 


156 


41894 


1 


44.48 


Bakery 


6 


12 





86 


0.04 


6 


12 





86 





0.04 


Bakery 2 


6 


22 


1 


247 


0.07 


6 


22 


1 


247 





0.07 


Controller 


6 


8 





95 


0.03 


6 


8 





95 





0.03 


Csm 


- 


- 


- 


- 


timeout 


2 


2 


2 


76 


1 


0.02 


Filter_simple 


- 


- 


- 


- 


timeout 


2 


4 


4 


1013 


132 


3.94 


Fischer 


10 


16 


2 


336 


0.16 


10 


16 


2 


336 





0.16 


Fischer_U 


8 


13 


3 


198 


0.08 


8 


13 


3 


198 





0.08 


German 


26 


2642 


678 


157870 


191.39 


26 


2642 


678 


157870 





191.39 


Ins_sort 










timeout 


2 


2 


1 


40 


1 


0.04 


MTC 










timeout 


1 








1261 


95 


0.85 


Mux_Sem 


7 


15 





174 


0.04 


7 


15 





174 





0.04 


Mux_Sem_param 


4 


5 





85 


0.04 


2 


3 


1 


57 


4 


0.02 


Order 


3 


3 





18 


0.01 


2 


2 


2 


16 


2 


0.01 


Simple 


2 


1 





10 


0.00 


2 


1 





10 





0.00 


Swimming JPool 


3 


81 





1300 


0.67 


3 


62 


3 


927 





0.73 


Szymanski+ 


21 


685 


102 


43236 


47.00 


2 


1 


1 


90 


2 


0.04 


Tickets 










timeout 


3 


4 


2 


201 


10 


0.06 


Tokcn_Ring 


3 


2 





30 


0.02 


3 


2 





30 





0.02 


Tricky 


8 


7 





22 


0.02 


2 


1 


1 


13 


1 


0.00 


Two .Semaphores 


4 


5 


1 


48 


0.02 


4 


5 


1 


48 





0.02 



finitary representation of (infinite) sets of states and require substantial modifications in the 
computation of the pre-image to adapt to different topologies. Second, since SMT solvers 
are capable of handling several theories in combinations, we can avoid encoding everything 
in one theory, which has already been proved detrimental to performances in \19\ I18j. SMT 
techniques were already employed in model-checking [MJ [9] , but only in the bounded case 
(whose aim is mostly limited at finding bugs, not at full verification). 

In more details, our contributions are the following. First, we have explained how 
to use certain classes of first-order formulae to represent sets of states and identified the 
requirements to mechanize a fully symbolic and declarative version of backward reachability. 
Second, we have discussed sufficient conditions for the termination of the procedure on 
the theories used to specify the topology (indexes) and the data (elements) manipulated 
by the array-based system. Third, we have argued that the classes of formulae allow us 
to specify a variety of parametrized and distributed systems, and imperative algorithms 
manipulating arrays. Finally, we have studied invariant synthesis techniques and their 
integration in the backward reachability procedure. Theoretically, we have given sufficient 
conditions for the completeness on the theories of indexes and elements of the array-based 
system. Pragmatically, we have described how to interleave invariant guessing and backward 
reachability so as to ameliorate the termination of the latter. We have implemented the 
proposed techniques in mcmt and evaluated their viability on several benchmark problems 
extracted from a variety of sources. The experimental results have confirmed the efficiency 
and flexibility of our approach. 
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7.1. Related work. We now discuss the main differences and simiiarities with existing 
approaches to the verification of safety properties of infinite state systems. We believe it is 
convenient to recall two distinct and complementary approaches among the many possible 
alternatives available in the literature. In examining related works, we do not attempt to 
be exhaustive (we consider this an almost desperate task given the huge amount of work 
in this area) but rather to position our approach with respect to some of the main lines of 
research in the field. 

The first approach is pioneered in [TJ and its main notion is that of well-structured 
system. For example, it was implemented in two systems (see, e.g., [21 [3]), which were able 
to automatically verify several protocols for mutual exclusion and cache coherence. One 
of the key ingredients to the success of these tools is their capability to perform accurate 
fix-point checks so as to reduce the number of iterations of the backward search procedure. 
A fix-point check is implemented by 'embedding' an old configuration (i.e. a finite repre- 
sentation of a potentially infinite set of states) into a newly computed pre-image; if this 
is the case, then the new pre-image is considered "redundant" (i.e., not contributing new 
information about the set of backward reachable states) and thus can be discarded without 
loss of precision. Indeed, the exhaustive enumeration of embeddings has a high computa- 
tional cost. An additional problem is that constraints are only used to represent the data 
manipulated by the system while its topology is encoded by ad hoc data structures. This 
requires to implement from scratch algorithms both to compute pre-images and embed- 
dings, each time the topology of the systems to verify is modified. On the contrary, mcmt 
uses particular classes of first- order formulae to represent configurations parametrised with 
respect to a theory of the data and a theory of the topology of the system so that pre- 
image computation reduces to a fixed set of logical manipulations and fix-point checking 
to solve SMT problems containing universally quantified variables. To mechanize these 
tests, a quantifier-instantiation procedure is used, which is the logical counterpart of the 
enumeration of "embeddings." Interestingly, this notion of "embedding" can be recaptured 
via classical model theory (see |37j or Section [4] above) in the logical framework underlying 
mcmt, a fact that allows us to import into our setting the decidability results of [TJ for 
backward reachability. Another important advantage of the approach underlying mcmt 
over that proposed in [Tj is its broader scope of applications with respect to the implemen- 
tations in [21 E]. The use of theories for specifying the data and the topology allows one 
to model disparate classes of systems in a natural way. Furthermore, even if the quantifier 
instantiation procedure becomes incomplete with rich theories, it can soundly be used and 
may still permit to prove the safety of a system. In fact, mcmt has been successfully em- 
ployed to verify sequential programs (such as sorting algorithms) that are far beyond the 
reach of the systems described in [21 [3] . 

The second and complementary approach to model checking infinite state system relies 
on predicate abstraction techniques, initially proposed in [Hj. The idea is to abstract 
the system to one with finite states, to perform finite-state model checking, and to refine 
spurious traces (if any) by using decision procedures or SMT solvers. This technique has 
been implemented in several tools and is often combined with interpolation algorithms for 
the refinement phase. As pointed out in [344 146] . predicate abstraction must be carefully 
adapted when (universal) quantification is used to specify the transitions of the system or its 
properties, as it is the case for the problems tackled by mcmt. The are two crucial problems 
to be solved. The first is to find an appropriate set of predicates to compute the abstraction 
of the system. In fact, besides system variables, universally quantified variables may also 
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occur in the system. The second problem is that the computation of the abstraction as 
well as its refinement require to solve proof obligations containing universal quantifiers. 
Hence, we need to perform suitable quantifier instantiation in order to enable the use 
of decision procedures or SMT solving techniques for quantifier-free formulae. The first 
problem is solved by Skolemization |34| or fixing the number of variables in the system |46j 
so that standard predicate abstraction techniques can still be used. The second problem 
is solved by adopting very straightforward (sometimes naive) and incomplete quantifier 
instantiation procedures. While being computationally cheap and easy to implement, the 
heuristics used for quantifier instantiation are largely imprecise and does not permit the 
detection of redundancies due to variable permutations, internal symmetries, and so on. 
Experiments performed with mcmt, tuned to mimic these simple instantiation strategies, 
show much poorer performances. We believe that the reasons of success of the predicate 
abstraction techniques in [34} |4"6] lie in the clever heuristics used to find and refine the set 
of predicates for the abstraction. The current implementation of MCMT is orthogonal to 
the predicate abstraction approach; it features an extensive quantifier instantiation (which 



is complete for the theories over the indexes satisfying the Hypothesis (I) from Theorem 3.3 



and is enhanced with completeness preserving heuristics to avoid useless instances), but 
it performs only a primitive form of predicate abstraction, called signature abstraction 
(see Section 6.3). Another big difference is how abstraction is used in mcmt: the set of 
backward reachable states is always computed precisely while abstraction is only exploited 
for guessing candidate invariants which are then used to prune the set of backward reachable 
states. Since we represent sets of states by formulae, guessing and then using the synthesized 
invariants turns out to be extremely easy, thereby helping to solve the tension between model 
checking and deductive techniques that has been discussed a lot in the literature and is still 
problematic in the tools described in [3] where sets of states are represented by ad hoc 
data structures. 

Besides the two main approaches mentioned above, there is a third line of research 
in the area that applied constraint solving techniques to the model-checking of infinite 
state systems. One of the first attempts was described in [19] and then furtherly studied 
in [18]. The idea was to use composite constraint domains (such as integers and Booleans) 
to encode the data and the control flow of, for example, instances of parametrised systems. 
Compared to our framework, the verification methods in |191ll8j are not capable of checking 
safety regardless of the number of process in a system but only supports the verification 
of its instances. Indeed, increasing the number of processes quickly degrades performances. 
Babylon is a tool for the verification of counting abstractions of parametrized systems (e.g., 
multithreaded Java programs [28]). It uses a graph-based data structure to encode dis- 
junctive normal forms of integer arithmetic constraints. Computing pre-images requires 
computationally expensive normalization, which is not needed for us as SMT solvers effi- 
ciently handle arbitrary integer constraints. Brain is a model-checker for transition systems 
with finitely many integer variables which uses an incremental version of Hilbert's bases 
to efficiently perform entailment or satisfiability checking of integer constraints (the results 
reported in [56] shows that it scales very well). Taking Tj to be an enumerated data- 
type theory, the notion of array-based systems considered in this paper reduce to those 
used by Brain. However, many of the systems that can be modelled as array-based sys- 
tems cannot be handled by Brain. Another interesting proposal to uniform verification of 
parametrized systems using constraint solving techniques is [15] , where a decidability result 
for Sg-formulae is derived (these are 3V-formulae roughly corresponding to those covered 
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by Theorem |3.3| above, for the special case in which the models of the theory Tj are all the 
finite linear orders). While the representation of states in [15] is (fully) declarative, tran- 
sitions are not, as a rewriting semantics (with constraints) is employed. Since transitions 
are not declaratively handled, the task of proving pre-image closure becomes non trivial; 
in [15], pre-image closure of X^-formulae under transitions encoded by S^-formulae ensures 
the effectiveness of the tests for inductive invariant and bounded reachability analysis, but 
not for fix-point checks. In our approach, an easy (but orthogonal) pre-image closure result 
for existential state descriptions (under certain X^-formulae representing transitions) gives 
the effectiveness of fix-point checks, thus allowing implementation of backward search. 

7.2. Future work. We envisage to develop the work described here in three directions. 
First, we plan to enhance the implementation of the signature abstraction technique in 
future releases of mcmt. The idea is to find the best trade-off between the advantages of 
predicate abstraction and extensive quantifier instantiation. Another aspect is the design 
of methods for the dynamic refinement of the abstraction along the lines of the counter- 
example-guided-refinement (CEGAR) loop |44| . A complementary approach could be to 
use techniques for the automatic discovery of relationships among values of array elements 
developed in abstract interpretation (see, e.g., [13]). Second, we want to perform more ex- 
tensive experiments for different classes of systems. For example, we have already started to 
investigate parametrised timed automata (introduced in [B]) with MCMT and found encour- 
aging preliminary results [20J. Another class of problems in which successful experiments 
have been performed with mcmt concerns the verification of fault-tolerant distributed al- 
gorithms [HJ [7] . The third line of future research consists of in exploring further and then 
implementing the verification method for a sub-class of liveness properties of array-based 
systems sketched in [37] . 
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Appendix A. Omitted Proofs 



Decidability of restricted satisfiability checking. The following result is a simple 



generalization of Theorem |3.3| (of Section 3.2). 

Theorem A.l. The Af -satisfiability of a sentence of the kind 

3ai ■ ■ -3a n 3i 3e Vj ^>(i, j,e, ai[i], . . . , a n \i], ai\j], . . 
is decidable. Moreover, the following conditions are equivalent: 

is Af -satisfiable; 

is satisfiable in a finite index model of Af 



■ a n [j] ) 



(A.l) 



(i) the sentence 

(ii) the sentence 

(iii) the sentence 



A.l 



A.l 



3ai ■ ■ ■ 3a n 3i3e/\ i/j (i, jcr, e, a\ [i] , . . . , a n [i],ai [[a] , . . . , a n [j<r]) 



(A.2) 



is Af -satisfiable (here a ranges onto the substitutions mapping the variables j into the 
set of representative T,j(i) -terms). 

Proof. In order to avoid difficulties with the notation, we consider the case where n = 1 
only (the reader may check that there is no loss of generality in that){^] We first show that 
the Af -satisfiability of 

3a3i3eVi^(i,i,e,a[i],a[j]) (A. 3) 

is equivalent to the ^^-satisfiability of 

3a3i3e f\ ^(i,jcr, e, o[z], a[jcr]) (A. 4) 

where a ranges onto the substitutions mapping the variables j into the set of representative 
S/(i)-terms. 

That -satisfiability of (A. 3) implies Af-satisfiability of (A.4) follows from trivial 
logical manipulations, so let's assume Af-satisfiability of (A.4) and show Af-satisfiability of 
( A.3). Let M. be a model of (A.4); we can assign elements in this model to the variables a, i, e 
in such a way that (under such an assignment a) we have A4, a |= /\ ip(i, ja, e, a[i], a[j<r]). 
Consider the model which is obtained from A4 by restricting the interpretation of the 
sort INDEX (and of all function and relation symbols for indexes) to the S/-substructure 
generated by the elements assigned by a to the i: since models of Tj are closed under 
substructures, this substructure is a model of Tj and consequently Af is still a model of 
Af. Now let s be the restriction of a(a) to the new smaller index domain and let a be the 
assignment differing from a only for assigning s to a (instead of a(a)); since ift is quantifier 
free and since, varying a, the elements assigned to the terms ja covers all possible j-tuples 
of elements in the interpretation of the sort INDEX in M, we have J\f, a \= Vjip(i, j, a[i], a [?']). 
This shows that M \= 3a 3i Vj ip(i, j, a[i], a[j]), i.e that (A.3) holds. Notice that A" is a 
finite index model p*| hence we proved also the equivalence between ^ and 
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Since existentially quantifying over variables that do not occur in the formula does not affect satisfia- 
bility, we can also assume that the tuple i is not empty (this observation is needed if we want to prevent the 
structure Af defined below from having empty index domain) . 

^This is because Tj is locally finite and the S/ reduct of Af is a structure which is generated by finitely 
many elements. 
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We now need to decide ^^-satisfiability of sentences (A.4). Let t be the representative 



S(i)-terms and let us put them in bijective correspondence with fresh variables I of sort 
INDEX; let tp a (h L§_,a[i\,a[l}) be the formula obtained by replacing in ip(i, ja, e, a[i], a[ja]) 
the S(i)-terms ja by the corresponding I. We first rewrite ( A.4[ ) as 



3a 3i 3e 3l_ (Z = t A A ip a (i, I, e, a[i], a[|])) (A. 5) 

(here I = t means component- wise equality, expressed as a conjunction). 

Notice that Tj and Tg are disjoint (they do not have even any sort in common), which 
means that I = t A /\ CT vp a (h L e, a[i], a[/]) is a Boolean combination of S/-atoms and of Sg- 
atoms (in the latter kind of atoms, the variables for elements are replaced by the terms 
a[i],a[7]). This means that our decision problem can be further rephrased in terms of the 
problem of deciding for ^4^-satisfiability formulae like 

i>iU)ha\i} = dAip E (d,e) (A.6) 

where tpi(j) is a conjunction of Sj-literals and tpE(d,e) is a conjunction of S^-literals. 

Since we are looking for a model of Tj, a model of T% and for a function connecting 
their domains (the function interpreting the variable a), this is a satisfiability problem for 
a theory connection (in the sense of [H]){^] since the signatures of Tj,Te are disjoint, the 
problem is decided by propagating equalities]^] Hence, to decide (A.6), it is sufficient to 
apply the following steps: 

— guess an equivalence relation II on the index variables j (let's assume j = j\, . . . .j n ); 

— check ^/(j) U {j k = ji | (jk,ji) G 11} U {j k / ji \ (jkji) n} for ^-satisfiability; 

— check ipE(d,e) U {d^ = d[ \ (jk,ji) G II} for T^-satisfiability; 

— return 'unsatisfiable' iff failure is reported in the previous two steps for all possible 
guesses. 

Soundness and completeness of the above procedure are easy. □ 



Undecidability of backward reachability. Here, we give the proof of Theorem 4-1 (of 



Section 4.2) 



Proof. A two registers Minsky machine is a finite set P of instructions (also called a pro- 
gram) for manipulating configurations seen as triples (q, m, n) where m, n are natural num- 
bers representing the registers content and q represents the machine location state {q varies 
on a fixed finite set Q). There are four possible kinds of instructions, inducing transforma- 
tions on the configurations as explained in Table [5| A P-transformation is a transformation 
induced by an instruction of P on a certain configuration. For a Minsky machine P, we 
write (q,m,n) — >p (q',m',n') to say that it is possible to reach configuration (q',m',n') 
from (q, m, n) by applying finitely many P-transformations. Given a Minsky machine P 
and an initial configuration (go, Trio, n o)i the problem of checking whether a configuration 
(q',m',n') is reachable from (qo,mo,no) (i.e., if (qo,mo,no) — >p (q',m',n') holds or not) is 



^Strictly speaking, one cannot directly apply the results from [TT] , because in this paper we have adopted 
a 'semantic' notion of theory characterized by a class C of models. In other words, the class C of models is 
not required to be elementary. 

This is different from the standard Nelson-Oppen combination, where also inequalities must be 
propagated. 
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N. 


Instruction 


Transformation 


I 


q->(r,l,0) 


(q, m, n) — > (r, m + 1, n) 


II 


g -►(»•, 0,1) 


(q, m, n) — > (r, m, n + 1) 


III 


q^(r,-l,0)[r') 


if m/0 then (g, m, n) — > (r,m— 1, n) 
else (g, m, n) — > (r',m, n) 


IV 


q^(r,Q,-W] 


if n / then (g, m, n) — > (r, m, n — 1) 
else (g, m, n) — >• (r',m, n) 



Table 5: Instructions and related transformations for (two-registers) Minsky Machines 



called the (second) reachability (configuration) problem. It is well-knowrp] that there exists 
a (two-register) Minsky machine P and a configuration (qo,mo,no) such that the second 
reachability configuration problem is undecidable. To simplify the matter, we assume that 
mo = and no = 0: there is no loss of generality in that, because one can add to the 
program P more states and instructions (precisely mo + uq further states and instructions 
of type I-II) for the initialization to mo, no. 

We build a locally finite array-based system Sp = (a, Ip, Tp) and an EK-formula C/ 9)min 
such that S is unsafe w.r.t. Uq^ m ,n iff the machine P reaches the configuration (g, m, n). 
We take as £/ the signature having two constants o, d and a binary relation S; models of 
Ti are the E/-structures satisfying the axioms 

\/i^S(i,o), ViVji Vj 2 (S(i, ji) A S(i,j 2 ) ->• ji = h), 

S(o,o'), Vn Vi 2 Vj (S(ii,j) A S(i 2 ,j) -Mi = i 2 ), 

saying that S is a an injective partial function having o in the domain but not in the range. 
As Tie we take the enumerated datatype theory relative to the finite set Q x {0, 1} x {0, 1}. 
Notice that Tj,Te are both locally finite; in addition, Tj is closed under substructures and 
Te has quantifier elimination. 

The idea is that of encoding a configuration (q, m, n) as any configuration s (in the 



formal sense of Subsection 4.1) satisfying the following conditions: 



(i) the support of si contains a substructure of the kind 

o = io ->-s o' = i\ ->s i 2 >s iK 

for some K > m,n (we write i —>s 3 to means that is in the interpretation of the 
relational symbol S in sj). 

(ii) for all i in the support of sj, if s(i) = (r, u, v) then (a) r = q; (b) u = 1 iff i = %k for a 
k < m; (c) v = 1 iff i = for & k < n. 

In case the above conditions (i)— (ii) hold, we say that s bi-simulates (q,m,n). 
The initial formula / is 

Vi((i 7^ o A a[i] = (q , 0, 0))V(j = oA a[i] = (q , 1, 1))). 
Clearly for every model M and for every s G ARRAY-^, the following happens: 
(a): |= /(s) iff s bi-simulates the initial machine configuration (go, 0,0). 
We write the transition r in such a way that for every model JM. and for every s, s' € 
ARRAY-^, the following happens: 



21 For details and further references, see for instance |21j 
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(/?): if s bi-simulates (q,m,n), then M. |= t(s,s') iff there is (q',m',n') such that s' 
bi-simulates (q',m',n') and (q,m,n) — >p (q',m',n'). 
This goal is obtained by taking r to be a disjunction of T-formulae corresponding to the 
instructions for P. The T-formula corresponding to the first kind of instructions q — > (r, 1, 0) 
is the following 

3il 3i 2 3i 3 (S(h,i 2 ) A S (12,13) /\pn(a[h]) = q A pr 2 (a[i{\) = 1 A 
Apr 2 (a[f 2 ]) = Apr 2 (a[i 3 }) = A a' = Aj T) 

where 

T : = if (j = i 2 ) then (r,l,pr 3 (a[j])} 
else (r,jW2(a[7']),pr3(o[j])) 
Instructions g —> (r, — 1,0) [r'] of the kind (III) are simulated by the following T-formula 
3h 3i 2 (S(i 1 ,i 2 ) Apri(o[ii]) = q /\pr 2 (a[i 1 ]) = 1 Apr 2 (a[i 2 ]) = 0) 

where 

F := if (h^oAj = h) then (r, 0,pr 3 (a\j])) 

else if (ii 7^ oA j / n) then (r,pr 2 (a[j]),pr3(a[j])) 

else (r',pr 2 (a[j]),pr 3 (a[j])) 

T-formulae for instructions of kind (II) and (IV) are defined accordingly. 

We write the unsafe states formula Uq, m ,n in such a way that for every model Ai and 
for every s G ARRAY 7 ^, the following happens: 

(7): if Ai |= Ug im , n (s) and s bi-simulates some machine configuration, then it bi- 
simulates (q,m,n). 

This goal is achieved by taking U q ^ myn to be the following formula (suppose m > n, the case 
re < m is symmetric): 

3*o ■■ • 3i m+ i (to = o A /\ < h < m S(i k ,ik+i) A Ao<fc<« a fe] = <?» x > !> A 

A An<fc<m = (9, 1, 0) A a[i m+1 ] = (q, 0, 0)). 

From (a)-(/3)-(7) above it is clear that P reaches the configuration (q,m,n) iff S is unsafe 
w.r.t. Uq trntn , so that the latter is not decidable (for the left to right implication, take a run 
in a model with a large enough 5-chain starting with o). □ 



Undecidability of unrestricted satisfiability checking. We show that Hypothesis (I) 
cannot be removed from the statement of Theorem 3.3 (and of Theorem A.l). We use a 
reduction to the reachability problem for Minsky machines as we have done for the proof 



of the undecidability of the safety problem (Theorem 4.1); the argument is similar to one 
used in [T7j. 

Let T/ be the theory having as a class of models the natural numbers in the signature 
with just zero, the successor function, and <. Notice that this is not locally finite. Let Te be 
the theory having QxNxNasa unique structure. Here Q is like in the previous subsection 
of this Appendix. In the following, we freely use projections, sums, numerals, subtraction, 
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For simplicity, we assume that the signature E_b is 4-sorted and endowed with the three projection 
functions pr\,pr2,pri mapping a data (r,u,v) to r,u,v, respectively: there is no need of this assumption, 
but without it specifications become cumbersome. 
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constants for elements of Q, etc. Formally, all these operations can be defined in many ways 
and the precise way is not relevant for the argument below. In other words, we can avoid 
to define precisely the signature E^. This sloppiness is justified because we must use a Tj 
not satisfying the local finiteness requirement from Hypothesis (I) of Theorem 3.3 , whereas 
we can use an arbitrary Te- Let r(a[ji], a\j 2 }) abbreviate the disjunction of the following 
formulae describing the transformations from Table [5| 

pr2(a[ji}) = q A a[j 2 ] = {r,pr 2 {a[j 1 }) + l,pr 3 (a[ji])) 

pr2(a[ji}) = q A a[j 2 ] = {r,pr 2 {a[j 1 ]),pr 3 (a[ji}) + 1) 

pr2(a[ji]) = q Apr 2 (a[ji]) > A a[j 2 ] = {r,pr 2 {a[j 1 ]) - l,pr 3 (a[ji])) 

pr2(a[ji]) = q f\pr 2 {a[ji]) = A a[j 2 ] = (r / ,pr 2 (a[ji]),pr 3 (a[ji])} 

pr2(a[ji]) = q f\prz{a[ii}) > A a[j 2 ] = (r,pr 2 (a[j 1 ]),prs(a[j 1 \) - 1) 

pr 2 (a[ji\) = qApr 3 (a[ji]) = Aa[j 2 ] = (r' ,pr 2 (a[ii)),pr 3 (a[ji})) 

Now consider the satisfiability of the following 3^'^V^-formula: 

3a 3i 3j (i = A a[i] = {qo, 0, 0) A a[j] = (q, m, n) A 

AVji Vj 2 (ji < j A j 2 = ji + 1 -> r(a[ji], a[j 2 ]))) 

Clearly, this is satisfiable iff the configuration {q, m, n) is reachable: the array a in fact stores 
the whole computation leading to (q,m,n). Thus satisfiability of B^V^-formulae can be 
undecidable if Tj is not locally finite, even if the SMT(Tj) and the SMT(Te) problems are 
decidable (and even if Tj is closed under substructures). 

A final observation is crucial. If we keep local finiteness and drop closure under substruc- 
tures in the statement of Hypothesis (I) from Theorem 3.3, then the above counterexample 
still applies! In fact, the successor function for indexes is used only in j 2 = j\ + 1 occurring 
in the formula above: we can replace the application of the successor function with a binary 
relation S{j\,j 2 ) so as to recover local finiteness. However, closure under substructures is 
dropped as the structure of natural numbers has proper substructures if successor is a rela- 
tion and not a function and these substructures must be excluded for the above argument 
to work (from satisfiability in such substructures a full computation cannot be recovered). 
Thus, we can conclude that the two conditions of Hypothesis (I) are strictly connected and 
both needed. 
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